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MATH COPROCESSOR 

BACKGROUND OF THE INVENTION 
CROSS-REFERENCE TO RELATED APPLICATION 

The following co-pending and co-assigned application 
contains related information and is hereby incorporated by 
reference : 

Serial No. 09/ (Attorney Docket No. 1043 -EP 

[2836-P099US] ) , entitled "SYSTEM ON A CHIP", filed 

, 2000; 

Serial No. 09/ (Attorney Docket No. 1044-EP 

[2836-P102US] ) , entitled "CLOCK GENERATOR" , filed 

, 2000; and 

Serial No. 09/ (Attorney Docket No. 1039-EP 

[2836-P104US] ) , entitled "VOLTAGE LEVEL SHIFTER", filed 
, 2000. 

FIELD OF THE INVENTION 

The present invention relates in general to electronic 
circuitry and in particular to math coprocessors. 

DESCRIPTION OF THE RELATED ART 

Sophisticated design and fabrication techniques are 
rapidly making practical systems-on-a-chip a reality. In 
turn, a broad range of personal and commercial hand-held 
appliances can be constructed which embody a high degree of 
functionality. These appliances include personal digital 
assistants, personal digital music players, compact 
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computers, point of sale devices, and Internet access 
devices, to name only a few of the possibilities. 

A number of factors must be addressed when designing a 
system-on-a-chip. Among other things, the device must be 
capable of interfacing with a broad range of input/output 
devices which may be required to support various potential 
user-defined applications. Moreover, the device must be 
power efficient while operating at high clock speeds. 
Additionally, this device should have a large address space 
to flexibly support a range of possible memory 
configurations and sizes. 



SUMMARY OF THE INVENTION 

According to one embodiment of the principles of the 
present invention, a mathematics coprocessor is disclosed 
which includes a multiplier accumulator unit have a 
multiplier array for selectively multiplying first and 
second operands, the first and second operands having a data 
type selected from the group including floating point and 
integer data types. An adder is included for selectively 
performing addition and subtraction operations on third and 
fourth operations. The third and fourth operands are 
selectively presented to the inputs of the adder by 
multiplexer circuitry which selects from the contents of a 
set of associated source registers, data output from the 
multiplier array, and data output from adder. 

Among the many advantageous features of this math 
coprocessor, is the multiplier-accumulator unit which will 
perform both floating point and integer arithmetic 
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operations. Moreover, the mathematics coprocessor can 
perform both single and double precision arithmetic 
operations on either floating point numbers or integers. In 
addition, a mathematics coprocessor instruction set supports 
such arithmetic operations as integer-to-floating-point 
conversion, single- precision-to -double-precision 
conversion, left- and right-shifts, absolute value, and 
negate . 



BRIEF DESCRIPTION OF THE DRAWINGS 

For a more complete understanding of the present 
invention, and the advantages thereof, reference is now made 
to the following descriptions taken in conjunction with the 
accompanying drawings, in which: 

FIGURE 1 is a diagram of a microprocessor-based 
system-on-a-chip embodying the principles of the present 
invention; 

FIGURE 2 illustrates a block diagram of a preferred 
microprocessor core; 

FIGURE 3A illustrates a more detailed functional block 
diagram of the DMA engine; 

FIGURE 3B is a more detailed functional block diagram 
of a selected LSFR; 

FIGURE 3C is a detailed block diagram of the Test 
Interface Controller (TIC) harness emphasizing the 
connections to the DMA engine,- 

FIGURE 4A is a functional block diagram of the graphics 
portion of the raster/ graphics engine; 
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FIGURE 4B illustrates the circuitry implicated in the 
preferred graphics engine test configuration; 

FIGURE 4C illustrates in further detail a block diagram 
depicting the raster engine portion of raster/graphics 
engine; 

FIGURE 4D illustrates a configuration for testing the 
various primary blocks of the raster engine using the TIC 
harness ; 

FIGURE 5A sets out an exemplary Type II EtherNet 

frame/packet format for purposes of discussing the EtherNet 

MAC- 
FIGURE 5B generally depicts the transmission process 

through the EtherNet MAC; 

FIGURE 5C is a state diagram illustrating a preferred 

Carrier Deference procedure used in the operation of the 

EtherNet MAC; 

FIGURE 5D depicts a schematic block diagram of the hash 
filter used in the EtherNet MAC; 

FIGURE 5E depicts preferred receive descriptor format 
and frame fragment chaining ; 

FIGURE 5F depicts a preferred formatting for the 
receive status queue; 

FIGURE 5G illustrates the receive data flow through the 
EtherNet ; 

FIGURE 5H illustrates the hardware - software 
interaction during the EtherNet receive process; 

FIGURE 51 illustrates an exemplary state of the receive 
queues following the reception of four frames; 
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FIGURE 5J depicts a preferred receive frame 
pre-processing procedure; 

FIGURE 5K depicts preferred transmit descriptor format 
and exemplary data fragments; 

FIGURE 5L illustrates an exemplary specific case of an 
EtherNet transmission where one frame is transmitted from 
three fragments; 

FIGURE 5M illustrates the EtherNet transmit status 
queue format; 

FIGURE 5N illustrates the general EtherNet transmit 

f low; 

FIGURE 50 illustrates the hardware - software 
interaction during the EtherNet receive process; 

FIGURES 6A-6D depict exemplary schematic diagrams of 
4-, 5-, 7- and 8 -wire touchscreen input /output devices; 

FIGURES 6E- 6F are electrical schematic diagrams 
showing the typical circuit connections to the system 
touchscreen interface for an 8 -wire touchscreen embodiment; 

Figures 6G illustrates the configuration in which a 
voltage is being driven across the Y-axis and the 
X-terminals and sampled against a feedback signal; 

FIGURE 6H illustrates the system configuration in which 
all input lines to A/D converter are being discharged to 
ground ; 

FIGURE 61 illustrates operational flow chart describing 
a preferred method of decoding a touchscreen entry; 

FIGURE 6J illustrates the touch detection configuration 
for a 7 -wire touchscreen embodiment; 
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FIGURES 6K - 6M respectively show exemplary 
configurations during Y-axis scan, X-axis scan, and line 
discharge for the 7 -wire touchscreen embodiment; 

FIGURE 6N illustrates a preferred procedure for 
5 scanning the touchscreen and determining touch location in 

reference to the resistive scanning block diagram of FIGURE 
6N; 

FIGURE 60 illustrates a typical system configuration 
during low power operation using the 5 -wire device as an 
10 example. 

FIGURE 6P illustrates an exemplary system configuration 
for determining battery voltage ; 

FIGURE 6Q depicts the touch controller TIC harness 
connections for the preferred embodiment; 
15 FIGURE 7 illustrates one preferred bit slice circuit 

suitable for use in the Interrupt Controller; 

FIGURE 8A depicts an exemplary 16-bit timer and found 
in the General Timer Block; 

FIGURE 8B depicts an exemplary 32 -bit timer found in 
2 0 the General Timer Block; 

FIGURE 8C is a functional block diagram of the timer 
block TIC harness connections; 

FIGURE 9A is a functional block diagram of the keyboard 
scan circuitry; 

25 FIGURE 9B shows an exemplary 8 row and 8 column 

keyboard for purposes of describing the keyboard scan 
circuitry; 

FIGURE 9C is a functional block diagram of the keyboard 
scan block connections to the TIC harness; 
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FIGURE 10A depicts an exemplary connection of the 
system with an external EEPROM through the EEPROM/ I 2 C 
interface ; 

FIGURE 10B illustrates the minimum timing relationship 
between the clock and data in the preferred EEPROM/ I 2 C 
interface; 

FIGURE 11A depicts a preferred dual codec serial 
interface; 

FIGURE 11B illustrates the centric loop backs where the 
loop back begins at the transmit buffers and ends at the 
received buffers; 

FIGURE 11C illustrates an exemplary analogs-centric 
loops back where the loop back starts and ends in the analog 
domain; 

FIGURE 12 illustrates the Test Interface Controller 
(TIC) harness emphasizing the connections to the watchdog 
timer; 

FIGURE 13 is a high level functional block diagram of a 
math coprocessor included in the preferred embodiment of 
system; 

FIGURE 14 is a schematics showing in further detail, 
the primary data processing blocks including an integer/ 
floating point comparator (FCMP) block; 

FIGURE 15 is a schematic showing in detail a floating 
point adder (FADD) ; 

FIGURE 16 is a schematic showing in further detail an 
integer/floating point multiplier and multiply accumulator 
with an integral adder (MMAC) ; 
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FIGURE 17 is now made to the flow chart showing the 
Instruction Decode and Operands fetch stage where the 
current instruction is decoded and operands are loaded into 
the source registers; and 

FIGURE 18 is a flow chart describing exemplary integer 
operations in MMAC. 
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DESCRIPTION OF THE PREFERRED EMBODIMENTS 

The principles of the present invention and their 
advantages are best understood by referring to the 
illustrated embodiment depicted in FIGURES 1-18 
of the drawings, in which like numbers designate like parts. 

Figure 1A is a diagram of a microprocessor-based 
system-on-a-chip 100 embodying the principles of the present 
invention. System 10 0 is a general purpose processing 
device suitable for use in a number of high performance 
personal and commercial information processing systems 
requiring small device size and low power consumption. 
Among other things, system 100 may be embodied in personal 
portable appliances, such as handheld music players, 
portable Internet appliances and personal digital 
assistants, commercial portable appliances such as portable 
point-of-sale terminals, as well as intelligent peripherals, 
telecommunications appliances and compact computers. 

In the preferred embodiment, system 100 is based on ARM 
920T microprocessor core 101 operating in conjunction with a 
set of on-chip peripheral devices via an AMBA High Speed Bus 
(AHB or peripheral bus high speed bus) 102 and an AMBA 
Advanced Peripheral Bus (APB) 103. The peripheral set will 
be discussed further below. A block diagram of 
microprocessor core 101 is shown generally in FIGURE 2; 
specific details are set out in the ARM920T data sheet 
available from ARM, Ltd., Cambridge, United Kingdom, 
incorporated herein by reference. Additionally, detailed 
specifications for AHB 102 and APB 103 are also available 
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from ARM, Ltd., such specifications also incorporated herein 
by reference. 

The functional blocks 104 - 130 described in detail 
below, as well as microprocessor core 101, are preferably 
5 coupled to buses 102 and 103 using tri-state buffering. A 

conceptual drawing of a preferred tri-state implementation 
is shown in FIGURE IB. Here, each output (data, address, or 
control signal) 131 from a given source block (101, 104 
-13 0) is coupled to the input of one or more corresponding 
10 destination blocks (101, 104 - 130) by a single conductor 

132 through a tri-state buffer 133. One source block is 
allowed to drive the given bus 102/103 while the outputs of 
the remaining source blocks are held in a tri-state or high 
impedance state. Thus, the timing of the activation and 
15 deactivations of the source block outputs is critical to 

avoid collisions. 

In the preferred embodiment, the current bus master 
grants the privilege to a selected source block to drive the 
bus for a given number of cycles. An idle cycle is inserted 
2 0 at the start of each burst of information to allow for the 

return of responsive information from the destination 
(slave) devices from the previous cycle. An idle cycle is 
also inserted before a new bus master takes control of the 
bus. During this idle period, addresses and control 
25 signals are preferably not driven on the bus, with the 

exception of the requisite transfer control signals. 

The tri-state buffer approach has substantial 
advantages over other bus interface techniques such as 
multiplexing and logical gating. Among other things, the 
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tri-state approach requires less logic to implement. 
Additionally, die area is saved which helps reduce the 
overall cost of the device. 

As shown in FIGURE 2, microprocessor core 101 includes 
5 a reduced instruction set computing (RISC) processor and one 

or more coprocessors shown collectively at block 200. In 
this embodiment, the available cache comprises both an 
instruction cache 201 and a data cache 202. Similarly, 
separate instruction and data MMUs 203 and 2 04 are used. 

10 The instruction modified virtual address (IMVA) , instruction 

physical address (IPA) and instruction data (ID) buses are 
each 32 bits wide. Similarly, the data modified virtual 
address (DMVA) , data physical address (DPA) and data data 
(DD) buses are 32 bits wide. Physical addresses and data 

15 are exchanged to AHB bus 102 through AMBA bus interface 205. 

A write buffer 2 06 allows for the parallel exchange of data 
through interface 205 during processor core operations. 
Data from data cache 202 can be output through write-back 
physical address (PTAG) RAM 207. 

20 System boot ROM 104 operates from high speed bus 101 

and controls the selection of the external source of program 
code from which system 100 operates. In the preferred 
embodiment, boot ROM 101 comprises 16 KBytes of 
mask -programmed memory. The external source could be for 

25 example flash memory. Program code under one boot option is 

directly executed from external flash memory. 
Alternatively, a loader program is downloaded through UART1 
or the PCMCIA (both discussed below) into SDRAM. This 
loader program in turn downloads a complete operating image 
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through either the UART1, PCMCIA, USB, or IrDA ports or the 
EtherNet interface and typically stores that image in flash 
memory. Additionally, in the preferred embodiment, the boot 
ROM code does not enable the microprocessor memory 
5 management unit (MMU) . The loader program therefore 

operates from physical addresses and handles the tasks of 
initializing the page tables and starting the MMU and 
caches . 

A multiple -channel Direct Memory Access (DMA) engine 

10 105 also operates off high speed bus 102. A more detailed 

functional block diagram of DMA engine is shown in FIGURE 
3A. In the illustrated embodiment, DMA engine 105 comprises 
8 processing paths 300 - 307 corresponding to 8 channels 0 - 
7. Each DMA path way is independently programmable with 

15 respect to source and destination addressing. Resource 

requests are received from the requesting devices, such as 
the UARTs discussed below via a 16-bit wide Request bus 301. 
The various resources connected to resource bus 3 08 are then 
associated with a given channel by setting bits in 

20 corresponding DMA control registers 313. Simultaneous 

memory access requests are resolved by an 8-way arbiter 3 09 
and multiplexer 310. Additionally, DMA engine 105 includes 
4 Linear Feedback Shift registers (LSFRs) 314-317, for 
performing CRC error correction. 

25 Generally, a DMA operation proceeds as follows In 

considering any DMA operation in the preferred embodiment, 
it must be recognized that the AHB has a pipe- lined 
architecture for both addresses and data and that any DMA 
channel can generate a internal request to AHB bus master 
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311 for access to AHB bus 102, When access to the bus is 
granted, arbiter 309 selects the channel to be serviced by 
the bus . 

The selected channel begins its access at the source 
5 location address driven on the bus during the previous bus 

cycle. If DMA engine 105 was not the bus master for the 
previous cycle, a bus idle cycle is inserted to avoid 
address bus contention problems. All channels share the 
same data storage and redirect logic 312/ therefore, during 

10 the read cycle, arbiter 309 locks multiplexer 310 to the 

current channel such that during the next bus cycle that 
same channel can complete its access with a write cycle. The 
sequence generally proceeds as follows: When the previous 
bus cycle is finished, DMA engine 105 is in a ready state. 

15 The data read cycle then executes, and data retrieved from 

memory are stored internally in a temporary storage register 
(block 312) . Depending on the width of the incoming data, 
the data register stores either a received single 32 -bit 
word, a received 16 -bit half word which has been duplicated 

2 0 to create a 3 2 -bit word, or an incoming byte which has been 

copied four times to create a 32 -bit word. 

At the same time, a write address is driven onto the 
bus. Bus master 311 inserts a bus idle cycle when necessary 
to avoid data contention. Once the write address is being 

25 driven on the bus, the arbiter lock on the active channel is 

released. During the write cycle, a single 32 -bit word, two 
16-bit half words, or four bytes are written on the bus as a 
32 -bit word. Address alignment in the case of half words 
and bytes is performed by the slave device. While the write 
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cycle is being performed the next read address is driven on 
the bus . 

DMA channels 300 - 307 are configured in register. For 
each channel, a 32 -bit source address pointer and a 32 -bit 
5 destination address pointer are defined to configure a 

transfer. The source and base addresses are incremented or 
decremented based on the state of a set of increment and 
decrement control bits assigned to each channel (If the 
increment and decrement bits are set to the same value for a 

10 channel, the address remains the same.) The address 

pointers increment or decrement by a different amount based 
on the width of the transfer. The configuration registers 
also control transfer word width in terms of 32 -bit words, 
16-bit half words, or bytes, as well of the length 

15 definition of the given transfer. 

DMA transfers can be either synchronized or 
unsynchronized. Unsynchronized transfers are initiated by 
software whenever a DMA channel is granted access to AHB 102 
by setting an enable bit. Clearing the enable bit halts the 

2 0 unsynchronized transfer. Synchronized transfers are 

initiated by a DMA request from resource bus 3 08, such as 
the serial channel transmit or receive buffers. During a 
synchronized transfer, when the enable bit is set, a DMA 
channel will transfers data when the request line is active 

25 and it has control of the bus. 

DMA engine further includes four 16/32 bit programmable 
LSFRs 314 - 317 for calculating CRCs based on common CRC 
algorithms including CRC- 16, Reverse CRC-16, CRC-CCITT 
(SDLC, X25, XMODEM), and reverse CRC-CCITT. In the 
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illustrated embodiment, LFSRs 314 - 317 are coupled to DMA 
channels 0-3. and are correspondingly labeled CRCO - CRC3 . 
The LFSRs 314 - 317 can be dedicated to their respective DMA 
channel or used independently by any bus master through the 
5 AHB register interface 327 and configuration registers 328. 
Each CRC calculator may be hardware connected to its 
respective DMA channels to allow DMA "through" the CRC 
generator. 

FIGURE 3B is a more detailed functional block diagram 
10 of a selected LSFR 314 - 317. The LSFR includes an input 

shift register 318, 16/32 bit LFSR 319, polynomial divisor 
320 and counter 321. During programming, the shifting mode 
for shift register 318 is selected between 8-, 16-, and 
32-bit modes and the LSFR size is selected to be either 16 
15 or 32 bits wide. The polynomial used by divisor 320 is 

selected in accordance with the CRC algorithm being used. 
The process is initialized by writing a seed value of LSRF 
319. 

Data in either an 8, 16, or 32 -bit format is then input 
20 through shift register 318 input CRC IN. In the 32- and 

16 -bit shift modes, the data stream is normally in a word or 
half word multiple of bytes. If not, the 32 or 16 bit shift 
mode is initially used and then the shifting switched to the 
8 bit mode for the remaining byte(s) . Once the data is 
25 written into shift register 318, there is a delay of either 
9, 17, or 33 bus clocks before the resulting data are 
available at the CRC OUT port and/or new data can be written 
in, for the 8, 16, or 32 bit modes respectively. The CRC 
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process will be discussed in further detail below in 
conjunction with the description of EtherNet MAC 107. 

Under the default priority scheme, channel 0 has 
highest priority, channel 1 the next highest, and so on 
5 until channel 7, which has the lowest priority, assuming 

that the DMA channels correspond to requests REQ 0-7. This 
priority scheme may be reprogrammed in register, in which 
case, more than one channel can have the same priority, with 
reversion to the default scheme when that level has the 

10 current highest priority. 

Round robin shifting by arbiter 309 supports rotation 
of priority level precedence as well as the shifting of 
precedence within a given priority level when two or more 
channels have the same priority level. In the case of 

15 overall rotation, the priority associated with each priority 

value changes in a round robin fashion in response to the 
HCLK, so long as no channels have been granted the bus or if 
the bus has been granted to a channel but the arbiter has 
not been locked. Between channels set to the same level, 

2 0 priority changes periodically with the HCLK if no channels 
have been granted the bus or if the bus has been granted to 
a channel but the arbiter has not been locked. A 
combination of the two schemes can be used to optimize 
performance. Notwithstanding, lowest priority values are 

25 still assigned to the most critical channels. 

FIGURE 3C is a detailed block diagram of the Test 
Interface Controller (TIC) harness as it relates to testing 
DMA engine. The test interface is generally shown in block 
322 in FIGURE 3A. Testing is effectuated through AHB 
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interface registers 323 and a corresponding set of 
multiplexers. The various subblocks, such as DMA channels 
300 - 307, CRC generators 314 - 317 and arbiter 309 can be 
tested individually or in parallel. Input signals are 
5 written to the text input stimulus registers 324 or fed-back 

from the output captive registers 326 and passed to the 
blocks under test through multiplexers 325. The 
corresponding test outputs are read from the DMA test output 
capture registers 326. 

10 The graphics engine of raster/ graphics engine block 

106 generally offloads graphics processing tasks from 
processor core 101, operating off high speed bus 102 as 
either the bus master or as a register slave. Among other 
things, graphics engine performs rectangular block fills, 

15 Bressingham line drawing and pixel step line drawing. Data 

transfers are made by graphics engine 106 through bit-block 
transfers (BitBLTs similar to the DMA transfers discussed 
above.) A functional block diagram of graphics engine 106 
is provided as FIGURE 4A. 

20 As briefly indicated, AHB interface 401 interfaces 

graphics engine 106 with high speed bus 102 in either the 
bus master or register slave modes. As the bus master, the 
graphics engine can access all user accessible areas of the 
system 100 memory map, including, but not limited to, the 

25 available graphics and video memory. This advantageously 

allows for block storage, such as for fonts or bit -mapped 
display data, anywhere in the system memory. Pixels are 
organized according to the Device Independent Bitmap 
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standard format and can be stored as either 1, 4,8, 16, or 
24 bits per pixel. 

Data path 402 performs masking operations such as pixel 
bit plane inversion, pixel bit plane addition, and pixel bit 
plane subtraction. Transparency logic is provided at the 
backend of the graphics data path for background 
preservation. In the preferred embodiment, mask logic 
operations are performed first, followed by destination 
logical combination, and then replacement of destination 
pixels based on source transparency description. Line 
pattern circuitry supports both Bressingham and pixel step 
line draws . 

The graphics engine address path 4 03 includes both X 
and Y bidirectional incrementation circuitry for 
effectuating these line draws. For block operations, a set 
of registers are programmed to define the width of the 
source block and the destination block width and height. 
The destination block width is the same as the source block 
width when unpacked source pixels are being transferred to a 
destination block of the same size and having the same 
starting pixel. Additional registers define the memory 
organization for the source and destination blocks in terms 
of line length, indicate whether the source data is packed, 
define the pixel depth in bits per pixel, and the count 
direction of incrementation. 

The graphics engine can also be tested using the Test 
Interface Controller (TIC) . The circuitry implicated in the 
preferred graphics test configuration is shown in FIGURE 4B. 
Similar to the DMA test harness, the graphics engine test 
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harness is controlled via an AHB slave interface registers 
404. Test vectors are written into test input stimulus 
registers 405 and then switched by multiplexers 406 to 
either the graphics engine data path, shown generally by 
5 block 407, and/or the graphics engine address path, shown 
generally by block 408. Multiplexers 406 may also be used 
to pass pixel mode sideband signals through the graphics 
path during test. The resulting output data is then held in 
test output capture registers 409 where then can be fed back 

10 through multiplexers 406 or transmitted on high speed bus 

102 via bus mastering circuitry 410. 

The raster engine portion of raster/graphics engine 106 
drives analog CRTs or digital LCDs, including non-interlaced 
flat panel and dual scanning devices. It can also support 

15 an optional interface to an NTSC encoder. The raster engine 

also preferably processes pixels in the DIB format, although 
those data do not necessarily have to be in a packed line 
architecture. Pixels can be in any one of a number of 
standard 4, 8, 16 or 24 bpp formats. 

20 The raster engine also includes dedicated AMBA video 

bus master / transfer interface 411 which interfaces the 
raster engine and high speed bus 102. Moreover, the raster 
engine connects to the DRAM controller through a dedicated 
DMA port allowing video images are read directly from memory 

25 and loaded into a video FIFO within video data path 412. 

The video FIFO generally maintains the video data 
stream from image memory (video frame buffer) to the video 
output circuitry without stalling. The video frame buffer 
can be either in main memory or a dedicated video memory 
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area (which can be designated anywhere in the memory map) . 
Generally, when the FIFO is less than half full, data are 
read from the video frame buffer until the FIFO is full, at 
which time the video data fetch halts . Once the FIFO again 
5 goes below half full, the process repeats itself, with more 

data retrieved from the frame buffer. 

Video data path 412 additionally includes blink control 
logic, a grayscale generator, cursor generation logic and a 
pair of color look-up tables. One look-up table is inserted 

10 into the video pipeline while the other is accessible for 

update via bus 102 . Multiplexers select between pixel data 
from the color look-up tables, the grayscale generator, 
cursor logic, and the blinking control logic. The output 
section of video data path 412 preferably includes an YCrCb 

15 encoder for interfacing with an NTSC encoder and output 

shift logic which allows multiple pixels to be shifted out 
each clock. 

The raster engine also embodies hardware cursor 
generation circuitry which is based on a dedicated cursor 

20 AMBA bus master and independent cursor address counters. As 

a result, the cursor can be stored anywhere in the available 
memory space associated with high speed bus 102. Cursor 
size, location and color are register programmable. 
The raster engine includes circuitry 413 which 

25 generates the vertical and horizontal synchronization and 

blanking signals, necessary to drive the display, as well as 
the pixel clock SPCLK. A pulse width modulated brightness 
control signal is also generated which, when used with an 
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external resistor and capacitor, is used to generate a DC 
brightness control voltage level. 

The various primary blocks of the raster engine can be 
tested using the TIC harness shown in FIGURE 4D. Test input 
5 stimulus registers 414 are loaded from AMBA bus 102 via 

slave register interface 415. Under control of the register 
contents, multiplexers 417 selectively couple either side 
band input signals or feedback from the test output capture 
registers 416 to the selected block or blocks under test. 

10 An EtherNet MAC 107 is also provided on AMBA bus 102 in 

the preferred embodiment. EtherNet MAC 107 supports 
communications with external devices in accordance with the 
EtherNet/ISO/IEC 8802-3 protocol. Under this protocol, a 
"listen before talk" mechanism is employed since only one 

15 device on a single shared medium can transmit at a time. 

This access method is generally known as Carrier Sense 
Multiple Access with Collision Detection (CSMA/CD) . Each 
station monitors its receiver for carrier activity. When 
activity is detected, the medium is busy, hence that station 

20 requiring the medium waits until the carrier is no longer 

detected. 

FIGURE 5A sets out an exemplary Type II EtherNet 
frame/packet format upon which the following discussion will 
be based. 

25 The transmission process 500 is shown generally in 

FIGURE 5B, the primary procedures being carrier deference, 
back-off, packet transmission, transmission of EOF and SQE 
test . 
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The transmission of the next frame in the 
first- in-first-out memory of the transmitting device is 
initiated Step 501. At Step 502, the carrier deference 
procedure is run. 
5 A preferred Carrier Deference procedure 5200 is 

illustrated by the state diagram of FIGURE 5C. It should be 
noted that the carrier deference procedure can be entered 
from any one of the depicted states, although this procedure 
can only be exited from the Interframe Gap (IFG) Complete 

10 state 5201. In this diagram, "CRS" is the sense of the 

carrier state, where a logic 0 represents no carrier sensed 
and a logic 1 represents a carrier present (sensed) state. 

Assume for discussion purposes that the procedure is 
currently in the IFG Complete state at Step 5201. When the 

15 line is sensed as busy, the CRS value changes from 0 to 1 

and the procedure waits at Step 52 02 for the CRS value to 
clear. Once the line is free and the CRS value clears to 
zero, either a one part or two part deferral is initiated, 
as selected by setting a corresponding bit in register. 

2 0 When a two part deferral is selected, a 6.4 ^sec delay 

corresponding to 2/3 of one full IFG period is initiated at 
Step 5203. If CRS returns to a logic 1 during this 6.4 //sec 
delay, (i.e. the line becomes busy) the process returns to 
the line busy status (Step 5202) ; otherwise the procedure 

25 proceeds to Step 5204 where a second fixed 3.2 //sec delay, 

corresponding to 1/3 of one IFG period, is inserted. When 
the 3.2 //sec timer completes at Step 5204, the process loops 
back to the IFG Complete state 5201. 
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When a one part deferral is selected, a fixed 9*6 //sec 
delay corresponding to a full IFG period is inserted at Step 
52 05* When this delay times out, the procedure returns to 
Step 5201. 

5 The 2 -part deferral has an advantage for AUI 

connections to either 1 OBASE-2 or 1 OBASE-5. If the 
deferral process simply allows the IFG timer to complete, 
then it is possible for a short IFG to be generated. The 
2 -part deferral prevents short IFGs. The disadvantage of the 
10 2 -part deferral is that the 2 -part deferrals are generally 

longer . 

After exiting the Carrier Deference procedure at Step 
502, the actual transmission of data on to the medium begins 
at Step 503 (FIGURE 5B) . The transmission ends with either 

15 the transmission of the end of complete frame (ECF) 

indicator at Step 504, and the consequent transmission of a 
status report at Step 505, or a collision. There are two 
kinds of collisions: normal collisions (ones that occur 
within the first 512 bits of the packet) and late collisions 

20 (ones that occur after the first 512 bits) . In either 

collision type, the MAC engine preferably sends a 32 -bit jam 
sequence at Step 506, and stops transmission. 

A decision is made at Step 507 as to whether the 
collision was late. In the event of a late collision, the 

25 applicable transmit status is reported at Step 508, and the 

transmission halted without a re-attempt. In the case of a 
normal collision, a determination is made at Step 5 09 as to 
whether a maximum number of normal collisions have occurred 
and if so, the transmission is aborted, and applicable 
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transmit status is reported at Step 510. The number of 
allowable collisions is determined from bits set in 
register, and typically is either 16 or 1. If the maximum 
number of collisions has not occurred, the back-off timer is 
5 triggered at Step 511, after which the process subsequently 

looped back to the Carrier Deference procedure (Step 502) . 

The back off timer value is calculated using either of 
the ISO/IEC standard or a modified back-off algorithms, as 
selected by the host. The standard or "truncated binary 

10 exponential back off" algorithm is generally in accordance 

with the formula: 0 <_ r < 2 k, where r is a random 
integer representing the number of slot times of wait before 
another transmission is attempted, wherein a slot time is 
equivalent to 512 bits (51.2 msec), k = minimum (n ; 10), and 

15 n is the nth retransmission attempt. The modified back-off 

algorithm increases the delay after each of the first three 
transmit collisions: 0 <_ r <_ 2 k, where k = minimum (n 
,10), but not less than 3, and n is the nth retransmission 
attempt. The advantage of the modified algorithm over the 

20 standard algorithm is a reduction in the possibility of 

multiple collisions on any transmission attempt, although 
the modified algorithm does extend the maximum time needed 
to acquire access to the transmission medium. 

The host may also disable the back off step by setting 

25 a bit in register. In this case, the transmitter waits for 

the IPG before starting transmission. It should again be 
noted that for a late collision, the transmission is aborted 
and no back off is implemented. 
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During transmission, EtherNet MAC appends the Frame 
Check Sequence (FCS) to the packet, as shown in FIGURE 5A. 
When enabled, a standard 32 -bit FCS is used and a standard 
CRC computation is performed to generate error flags and 
5 associated interrupts, as required. For reference, the 

standard polynomial for the CRC is: 

G (x) =x 32 +x 26 +x 26 +x 22 +x 16 +x 12 +x 11 +x 10 +x 8 +x 7 +x 5 +x 4 + x 2 + x + 1 . 

EtherNet MAC 107 performs two forms of destination 
address filtering, namely, perfect filtering where the 

10 address is checked for an exact match and hashing where the 

address is checked for inclusion in a group. In addition, in 
the Promiscuous mode when enabled in register, all 
destination addresses are accepted. 

In the preferred embodiment, four programmable perfect 

15 address filters are provided, as well as an n all ones 

filter" for broadcast frames. A register is used to control 
whether a particular filter is used, with the four filters 
sharing the same address space. Preferably, the first 
filter is used to filter normal EtherNet addresses, as well 

20 as for detecting remote wake up frames and, optionally, 

pause (flow control) frames. In turn, the second filter is 
typically used for the recognition of pause frames, and may 
also be programmed to correspond to the multicast addresses 
of MAC control frames. The third and fourth filters 

25 preferably provide extra optional address match 

capabilities, for example, as extra individual address or 
multicast address filters. 

A schematic block diagram of the hash filter is 
depicted in Figure 5D. Generally, the hash filter is a 
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64-bit Logical (Multicast) Address Filter which performs 
Destination Address (DA) filtering hashed by CRC logic. CRC 
logic 512 initiates a CRC computation starting at the first 
bit of the current frame (i.e., the first bit of the DA, 
5 where the DA is a packet, such as shown in FIGURE 5A, 

without the preamble) . CRC Logic 512 includes a 32 -bit 
shift register with specific Exclusive-OR feedback taps. 
After the entire DA has been shifted into CRC logic 512, the 
6 most significant bits of the contents of CRC logic 512 are 
10 latched into 6-bit hash register (HR) 513. The contents of 

hash register 513 are passed through a 6-bit to 64-bit 
O decoder 514. Each of the 64 bits from the decoder are 

m presented to a hash table 515 one at a time. The output of 

p y the hash table determines whether the DA has passed the hash 

Iji 15 filter; when true, the DA has passed hash filtering and when 

false, the DA has failed the hash filter. 
H; Whenever the hashed filter is passed on received good 

y frames, the output of the hash register 513 is presented as 

O th e Hash Table Index. A received good frame is determined 

;Jf 2 0 to be one without CRC error, and which is correct in length 

□ (64<length<1518) . By setting a register bit, any received 

multicast frame passing the hash filter is accepted. A 
multicast frame is one which has IA[0] =1. If a second 
register bit is set, then a frame with any individual 
25 address frame AND passing the hash filter is accepted. An 

individual address frame is one which has IA[0] = 0. For a 
frame to pass IAHashA it must have IA[0] = 0 and pass the 
hash. 
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EtherNet MAC 107 provides special support for flow 
control by the transmission and reception of pause frames. A 
pause frame is a control frame that defines an amount of 
time for a transmitter to stop sending frames. Sending pause 
5 frames thereby reduces the amount of data sent by a remote 

station. The MAC can detect receive pause frames, and 
automatically stop its transmitter, for the appropriate 
period of time. To be interpreted as a pause frame: (1) the 
Destination Address must be accepted by one of the first two 

10 perfect address filters; (2) a Type field must match that 

programmed in a Flow Control Format register; (3) the next 
two bytes of the frame (MAC Control Opcode) must equal zero; 
and (4) the frame is of legal length with a good CRC. If 
accepted as a pause frame, the pause time field will be 

15 transferred to a Flow Control Timer register. The pause 

frame may be optionally passed on to the Host CPU or 
discarded . 

When receive congestion is detected, an EtherNet MAC 
107 driver may transmit a pause frame to the remote station, 

20 to create time for the local receiver to free resources. As 

there may be many frames queued in the transmitter, and 
there is a chance that the local transmitter is itself being 
paused, an alternative method is provided to allow a pause 
frame to be transmitted. In particular, by setting the Send 

25 Pause bit in a Transmit Control register, a pause frame will 

be transmitted at the earliest opportunity. This will occur 
either immediately following the completion of the current 
transmit frame. If the local transmitter is paused, the 
pause frame will still be sent, and a pause timer will still 
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be decremented during the frame transmission. To comply with 
the standard, pause frames should be sent on full duplex 
links. The MAC does not enforce this, it is left to the 
driver. If a pause frame is sent on a half duplex link, it 
5 will be subject to the normal half duplex collisions rule 

and retry attempts. 

EtherNet MAC 107 includes a receive descriptor 
processor which manages receive data frames. In particular, 
the host passes descriptors to the receive descriptor 

10 processor through a circular receive descriptor queue in a 

contiguous space in host memory. EtherNet MAC 107 returns 
status information through a circular receive status queue 
in host memory. The two independent queues support burst 
transfers, which reduce bus usage and latency. The location 

15 and characteristics (e.g. length) of these queues are set up 

in register. 

Each receive descriptor is composed of two double words 
defining one data buffer entry. The first double word 
contains the data buffer address and fields defining the 

20 buffer length, the buffer index and a Not Start of Frame bit 

(set by the host when a new frame is not being started, for 
example, when frame fragments are being chained) . Control of 
the use of the descriptors is handled using the Receive 
Descriptor Enqueue register (RxDEQ) , where "enqueue" refers 

25 to the action of adding descriptors to the end of an 

existing queue. To enqueue receive descriptors, the CPU 
writes a number of available descriptors to the RxDEQ 
register, and that number is automatically added to the 
existing number of available queue entries. When the MAC 
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reads descriptors into its on local storage (internal 
buffer), the number read is subtracted from the total. The 
CPU can read the total number of unread valid descriptors 
left in the queue from the RxDEQ. A preferred receive 
5 descriptor format and frame fragment chaining are 

illustrated in FIGURE 5E. 

EtherNet MAC 107 uses the Receive Status Queue to send 
status messages to the host. Typically, receive status 
entries are written to the queue by EtherNet MAC 107 at the 

10 end of a header, end of a buffer or the end of a frame. 

More generally, the status messages are preferably written 
after the completion of the given data transfer. Receive 
status messages are also formed from two double words. The 
first double word includes fields indicating receive error 

15 status, end of buffer and/or end of frame indicators, 

address matching, and a hash table index, among other 
things. The second double word includes fields for a 
receive frame process bit, a buffer index corresponding to 
the status entry, and a frame length identifier. 

2 0 The Receive Status Enqueue register is used by the CPU 

to pass free status locations to the EtherNet MAC. To 
simplify this process the CPU writes the number of 
additional free status locations available to this enqueue 
register. The MAC adds the additional count to the count of 

25 previously available entries to determine the total number 

of available receive status entries. When the MAC writes 
status messages to the queue it subtracts the number written 
from this total . 
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A preferred formatting for the receive status queue is 
shown in FIGURE 5F. 

The receive data flow through EtherNet MAC 107 is 
illustrated with reference to FIGURE 5G, and the following 
5 table: 

TABLE 1 



i 1 - 

1 


Host Driver 516 initializes a given number of receive 
descriptors in receive descriptor queue 522 


2 . 


Driver 516 sets the register field RxDeq with the 
additional number of receive descriptors. 


; 3 * 


On-chip Descriptor Processor 517 fetches descriptors into 
internal FIFO. 


i 

1 4 . 

! 

| 

j 


The address of the next receive data buffer is loaded into 
the Receive Buffer Current Address register of Receive 
Descriptor Processor 517 from Receive Descriptor Registers 
518 . 


5 . 


A frame is received from the LAN medium 519. 


6. 


MAC Engine 107 passes the frame data to the Receive Data 
FIFO of processor 517. 


7. 


The Receive Descriptor Processor stores the frame data 
into system memory 520 (Steps 5, 6, and 7 can overlap) . 


8. 


End of frame status is written to the Receive Status Queue 
521; RxSeq decremented. 


9. 


Driver 516 interrupted if interrupt conditions met. 


10. 


Received frame passed to the protocol stack. 


11. 


Driver 516 clears the Receive Frame Processed bit in 
Status Queue 521. 
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12. : Driver 516 writes number of entries processed in the 

status queue, freeing them for future use by the MAC 10 



13. After the driver 516 gets the used receive buffers back 
from the stack, the driver may repeat step 2 . 



Receive errors are categorized as hard errors and soft 
errors . A soft error indicates that a frame was not 
successfully received; this type of error must be addressed 
by the host driver. Soft errors include: CRC errors, 
receiver over-run, frames too long, or frames too short. 
Hard errors are reliability induced errors and include AHB 
bus access errors, parity errors (when enabled) , system 
errors, and master or target aborts. Hard errors stop 
receive DMA activity, and require host intervention for 
recovery. 

Figure 5H illustrates the hardware - software 
interaction during the receive process. Initially the 
software resets at Step 523 and the hardware is in an idle 
mode at Step 524. The receive descriptor and status queues 
are initialized by software at Step 525 and additional 
descriptors and status entries are added to the 
corresponding queues at Step 52 6. At Step 527, the 
descriptors are loaded by the hardware and the first frame 
is received at Step 52 8. 

Additional descriptors are written into the queue at 
Step 529. At the end of the first frame, a corresponding 
entry in the transmit status queue is written to (Step 530) . 
At Step 531, additional descriptors are loaded by the 
hardware while another frame of data is received at Step 
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532. At Step 533, the next status entry in the receive 
status queue is processed by the host and additionally 
entries made available by the host. 

This process generally continues in a similar manner, 
5 with the hardware updating the status queue at Step 534 and 

loading new descriptors at Step 535. The software adds 
additional descriptors to the descriptor queue at Step 536, 
processes status entries from the status queue and then 
frees entries at Step 537. 
10 An exemplary state of the receive queues following the 

reception of four frames is shown in FIGURE 51. The first 
frame uses Data buffer 0 only and has two status entries 
associated with it. The first status entry (status 0) is 
for the reception of a receive header and the second (status 
15 1) for the end of frame /buff er, with both status entries 

pointing to the beginning of data buffer 0. The second 
frame occupies two buffers (data buffers 1 and 2) , and is 
associated with three status entries (2 7 3, and 4). Status 
2 entry is for the receive header, status 3 entry for the 
20 end of buffer 1 indicator (e.g. frame size larger than 

buffer size) , and status 4 entry for the end of frame /buffer 
indicator. The next two frames both occupy one data buffer 
each and require one status entry each. (This could be the 
case for short frames which do not exceed the header size or 
25 the buffer size.) The result is that the status queue may 

be used at a different rate than the descriptor queue, based 
on the type of traffic and the options selected. 

A receive frame pre-processing procedure is shown 
generally in FIGURE 5J. First the frame is either passed on 
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to the next level or discarded according to the destination 
address (DA) filter 540. An accept mask 541 is then 
applied. A frame is accepted when the frame data are 
brought into and through the chip. Frames not passing the 
5 accept mask are discarded. An interrupt (IE) mask 542 makes 

the decision on causing an interrupt. 

Transmit descriptors are passed from the CPU to the MAC 
via a circular transmit descriptor queue. The location and 
size of the queue are set at initialization by the host by 

10 writing to register. Enqueueing descriptors is the process 

of adding descriptors to an existing queue and is achieved 
by writing an additional number of descriptors to the 
Transmit Descriptor Enqueue register. The written value 
will be added to the previous value to keep a running total, 

15 as descriptors are read by the MAC the total is decremented. 

The running total is available by reading the enqueue 
register. It should be noted that one frame may be 
described by more than one descriptor, with the final 
descriptor containing the EOF bit, and that not all the 

20 descriptors for a frame need to be supplied at once. 

A preferred transmit descriptor format and exemplary 
data fragments are shown in FIGURE 5K. Transmit descriptors 
preferably consist of two double words. The first double 
word contains the transmit buffer address pointer. The 

25 second double word includes the end of frame bit and the 

transmit buffer index for tracking the transmit buffer with 
the host. The second word also includes an abort frame bit 
for terminating a frame with a bad CRC, and a buffer length 
field representing the byte count in the transmit buffer. 
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FIGURE 5L illustrates a specific case where one frame 
is transmitted from three fragments. After hardware has 
acquired the medium and transmitted the preamble, fragments 
0, l, 2 are transmitted in order for a total of 446 bytes 
5 (39 + 388 + 19) . Since the CRC bit in the first frame 

fragment is clear, the hardware appends the 4 byte CRC 
making the total frame length 430 bytes. Finally, the 
end-of-frame indicator is sent according to normal EtherNet 
procedures . 

10 A Transmit status queue is used to pass transmit status 

messages from EtherNet MAC 107 to the host. Preferably, the 
status queue is also a circular queue in contiguous memory 
space. The location and size of the queue are set at 
initialization by the host by writing location and size data 

15 in register. The transmit status queue format is shown in 

FIGURE 5M. Generally, one transmit status entry is posted 
per transmit frame, regardless of the number of transmit 
descriptors used for that frame. A preferred entry includes 
a transmit frame processed bit, transmit without error bit, 

20 frame abort and loss of CRS bit, out -of -window bit, under- 

run and excessive collision bits, a field representing the 
number of collisions, and the transmit buffer index. 

The general transit flow is shown in FIGURE 5N and the 
following table: 

25 TABLE 2 



1. 


The Host Protocol stack initiates a transmit frame. 


2 . 


The Host Driver 543 parses protocol stack buffer into 
Transmit Descriptor Queue 
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3 . 


Driver 543 writes number of additional entries to the 
■Transmit Descriptor Enqueue (TxDEQ) register 544. 


4. 


On-chip Transmit Descriptor Processor 545 fetches 
'descriptor information from registers 546. 




On-chip Descriptor Processor 545 initiates data move. 


6 . 


A frame of data fetched from system memory 52 0 into the 
transmit FIFO within processor 545. 


i 

7. 


Frame transmitted onto LAN medium. 519 (Steps 6 and 7 can 
overlap) . 


8. 


End of frame status written to status queue 547 


9. 


Driver 543 interrupted if interrupt conditions met. 


10. 


Driver 543 processes the transmit status. 


11. 


Driver 543 informs the protocol stack that transmit is 
complete . 



10 Transmit error conditions are categorized as hard and 

soft errors. A soft error indicates that the frame was not 
successfully transmitted and requires a graceful recovery by 
the host driver. Soft errors include: excessive collisions, 
SQE error (if connected to a MAU) . Hard errors are typically 

15 related to reliability problems, such as AHB errors, parity 

errors (if enabled), system errors, master and target 
aborts . 

Hard errors cause the descriptor processor to halt 
operation, allowing the host a chance to determine the cause 
20 of error and reinitialize and restart the bus master 

operations. Most soft errors do not cause the frame 
processing operations to halt and the descriptor processor 
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simply flags the error and continues on to the next frame. 
The exception is on a transmit underrun, where bit Underrun 
Halt, gives the option of continuing on to the next frame or 
halting transmit frame processing. By halting the transmit 
frame processing the CPU has the ability to reset the 
transmit descriptor processor registers to point to the 
start of the failed frame and reinitialize. This will cause 
EtherNet MAC 107 to reattempt transmitting the failed frame 
next thereby allowing the order of frame transmission to be 
maintained. 

Figure 50 illustrates the hardware - software 
interaction during the transmit process. Initially the 
software resets at Step 550 and the hardware is in an idle 
mode at Step 551. The transmit descriptor and status queues 
are initialized by software at Step 552 and the transmit 
descriptor count is written to register at Step 553. At 
Step 554, the descriptors are read by hardware followed by a 
read out of data from the system at Step 555. The first 
frame is then sent at Step 556. 

The transmit descriptor count is updated in register at 
Step 557. During the transmission of the first frame, 
additional transmit descriptors are read from the queue at 
Step 558, followed by a read of data from system memory at 
Step 559. At the end of the first frame, a corresponding 
entry in the transmit status is written to the transmit 
status queue (Step 560) . At Step 561, additional data are 
read by the hardware while another frame of data is 
transmitted at Step 562. At Step 563, the next status entry 
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in the transmit status queue is processed and additionally 
entries made available by the host. 

This process generally continues in a similar manner, 
with the hardware reading descriptors from queue at Step 5 64 
5 and new data Step 565. The software adds additional 

descriptors to the descriptor queue at Step 566, processes 
status entries from the status queue, and then frees entries 
at Step 567. Status entries are written out at Step 568 
into the status queues. 

10 With regards to EtherNet MAC 107, interrupts can be 

associated with on-chip status or with off-chip status, 
off-chip status being status that has been transferred to 
either the transmit or receive status queues. The status 
for any outstanding interrupt events is available via two 

15 different register addresses (Interrupt Status Preserve and 

Interrupt Status Clear) . 

Reading the Interrupt Status Preserve field has no 
affect on the bits set in the register; they may be 
explicitly cleared by writing a one back to any of the bit 

20 positions. This allows the CPU to process interrupt events 

across multiple routines, only clearing the bits for which 
it has processed the corresponding events. 

The Interrupt Status Clear will remove the status for 
all outstanding events, when it is read. This provides a 

25 quick mechanism for the CPU to accept all the outstanding 

events in one read, and not incur the additional 10 cycles 
typically required in specifically clearing the events. 

SDRAM interface 108, operating off AHB 102, is 
preferably based on an ARM PL090 SDRAM controller and a set 
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of associated configuration registers. In the illustrated 
embodiment, SDRAM interface 108 shares address bus, data bus 
and DQMn signals with the SRAM controller and PCMCIA 
interface, arbitrated by external bus interface circuitry 
5 under a fixed priority scheme (SDRAM, SRM, PCMCIA and TIC in 

order from highest to lowest) . Preferably, all SDRAM 
accesses are performed using quad bursts. 

The SRAM interface (block 109) is preferably based on 
an ARM PL090 Static Memory Controller. Additionally, the 

10 SRAM interface supports programmable base addresses and 8 

external chip selects and associated mask registers. A mix 
of 32-bit, 16-bit and 8-bit devices are supported. 

Block 109 additionally includes a slave-only V2 . 1 
compliant PCMCIA PCCard Interface operating off high speed 

15 bus 102. The PCCard Interface shares external data and 

address buses with the Static Memory Interface, Dynamic 
Memory Interface and the Test Interface Controller. 
Arbitration between these blocks and the external resources 
is accomplished through an External Bus Interface (EBI) 

2 0 unit. Once granted access to the external buses, the PCCard 

Interface controls the buses until the current data transfer 
is complete. 

In the preferred embodiment, the PCCard Interface 
includes a controller based on an ARM Static Memory 
25 Controller which controls PCCard accesses to the system 

memory, I/O and attribute address spaces. Dynamic bus 
sizing is used wherein the transfer data width matches the 
target data I/O width. Moreover, in this embodiment, 
multiple card accesses are performed to complete the 
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requested bus transfer for either read or write card 
operations. For example, during a word write to an 8-bit 
PCCard, the PCCard Interface performs 4 card writes. 
Alternatively, half-word writes to an 8-bit card are 
5 performed using double card writes, word writes to a 16 -bit 

card using double card writes, and so on. 

The PCCard Interface is configured by the system 
initialization code through a corresponding set of 
registers. Three of these registers are used to control 

10 access to the memory, I/O and attribute address spaces. 

Another register is used to control card detection and 
interrupts and a fifth controls general interface operation. 
In the default state, these registers are set to the timing 
requirements compatible with the slowest PCCard and the 

15 fastest bus speed. Additionally, the wait states for both 

read and write operations are programmable from between 1 
and 31 AHB 102 clock (HCLK) cycles (the duration of the read 
and write pulses is the number of wait states plus 3 AHB 
clock cycles) . 

20 In the preferred embodiment, external address buffers 

and data bus transceivers are used to make the PCMCIA PcCard 
specification. Additionally, in the preferred embodiment, 
an external switch module is used to control the PCCard 
power supplies. Generally, the PCCard Interface, under 

25 firmware control, determines whether or not a PCCard is 

present. If a card is inserted, an interrupt is issued to 
the processor and firmware interrogates the PCCard interface 
to determine the appropriate switching of the PCCard power 
supplies . 
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An interrupt is also generated when a change of state 
occurs at the PCCard detect pins and at chip reset. 
Specifically, if a card is not present at chip reset, an 
interrupt is generated while if a card is present, no 
interrupt is generated. 

The PCCard interface preferably communicates to an 
associated PCCard slot using tri-state buffers. 

JTAG/TIC interface 110 supports testing in compliance 
with IEEE Std. 1149.1 - 1990, Standard Test Port and 
Boundary Scan Architecture. The Test Interface Controller 
supports on-chip testing of the various blocks on high speed 
bus 102. In the preferred embodiment, testing through 
interface 110 is in accordance with the specification of 
ARM920T processor 101. In particular, the JTAG part of the 
interface takes advantage of the ARM Multi_ICE in-circuit 
emulator while the TIC portion of the interface utilizes an 
ARM Test Interface Controller, which is a bus master on AMBA 
bus 102 and allows an off-chip testing device access to the 
AMBA peripherals. 

USB Controller 111 is preferably configured for three 
root hub ports and includes an integrated transceiver. This 
embodiment complies with the Open Host Controller Interface 
Specification for USB, Revision 1.0. 

LCD DAC interface 112 provides an analog DC voltage for 
driving LCD contrast controls, preferably generated from a 
resistor ladder. The DAC preferably is a 64 -step digital to 
analog converter. 

Bridge 113 interfaces high speed bus 102 with the 
relatively slower AMBA Peripheral Bus (APB) 103. Bridge 113 
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is a slave on high speed bus 102 and the only master on 
peripheral bus 103, driving addresses, data and control 
signals during peripheral accesses. While bridge 113 itself 
contains no registers, it does decode register selects for 
all peripherals on peripheral bus 103. The preferred system 
memory map is as follows. 



TABLE 3 



] 

Start 


; End 


Size Usage 


0000_0000 


' 0000_3FFF: 


16K 


Internal ROM Memory 
(Remap Low) 


0000_4000 


1FFF_FFFF: 


255 .984 
Meg 


External DRAM Memory 
(Remap Low) 


0000_0000 


1FFF_FFFF: 


256 Meg 


External DRAM Memory 
(Remap High) 


2000_0000 


7FFF__FFFF: 


1.5 G 


External 

S RAM/Flash/ROM Memory 


8000_0000 


87FF_FFFF: 


12 8 Meg 


Memory mapped AHB 
control registers 


8800_0000 


8FFF_FFFF: 


128 Meg 


Memory mapped APB 
control registers 


9000_0000 


9FFF_FFFF: 


256 Meg 


Reserved 


A000_0000 


A3FF_FFFF: 


64 Meg 


PCMCIA Memory Space 


A400_0000 


A7FF_FFFF: 


64 Meg 


PCMCIA I/O Space 


A800_0000 


ABFF_FFFF : 


64 Meg 


PCMCIA Attribute space 


AC00_0000 


AFFF_FFFF : 


64 Meg 


Reserved 


B000_0000 


FFFF_FFFF : 


1.25 G 


External SRAM/Flash/ 
ROM memory 
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Analog touch screen interface 114 performs hardware 
scanning for 4-, 5-, 7-, and 8 -wire analog resistive touch 
screens. Exemplary schematic diagrams of 4-, 5-, 7- and 
8 -wire touchscreens are shown in FIGURES 6A - 6D 
5 respectively. In each case, when a point on the touch screen 

is depressed, front and backside conductive layers touch and 
a resistive contact is made. In the 4- and 8 -wire versions, 
the contact point is identified by first driving a voltage 
on the X layer through busbars 601b and 6 Old from the X+ and 
10 X- terminals and measuring the voltage at the Y+ and/or Y- 

terminals, and then by measuring a voltage driven on the 
Y-plane Y+ and Y- terminals at the X+ and/or X- terminals. 
The results of the two measurements are compared to 
predetermined calibration voltages, to determine position. 
15 The 8 -wire version includes SX and SY lines provide feedback 

to the associated analog to digital to analog converter for 
use as a measurement reference. 

In the 5- and 7 -wire embodiments, a constant voltage is 
applied at terminals V+ and V- and the Z+/- terminals are 
20 used for switching the X and Y axes. The signal at the 

Wiper terminal is sampled to read the position data. The 
7 -wire touchscreen includes reference feedback lines to the 
associated analog to digital converter. 

FIGURES 6E - 6F are electrical schematic diagrams 
25 showing the typical circuit connections for an 8 -wire 

touchscreen. A set of 28 switches (SW0-SW27) sample the 
voltages at the touchscreen terminals to the inputs of 
analog to digital converter 603. (The switch positions in 
actuality set bits in register, but for simplicity of 
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discussion, circuit operation will be described in terms of 
the state of the switches) . In FIGURE 6E, the circuitry is 
in the process of detecting a touch on the screen. In 
FIGURE 6F, a voltage is being driven across the screen 
5 X-axis and the Y- terminals are being sampled referenced 

against the voltage on the SX feedback lines. 
Correspondingly, in FIGURE 6G a voltage is being driven 
across the Y-axis and the X-terminals are being sampled 
referenced against the voltage on the SY feedback lines. 

10 FIGURE 6H illustrates the configuration in which all input 

lines to A/D converter 603 are being discharged to ground. 
These states will be further described in conjunction with 
the operational flow chart of FIGURE 61. 

The circuitry for the 4 -wire touchscreen is similar to 

15 that for the 8 -wire device described above, except the A/D 

reference voltage is internal. Additionally, the SX and SY 
inputs and associated switches are not used in the 4 -wire 
case . 

One preferred procedure 600 for scanning the 
20 touchscreen and determining touch location is illustrated in 

reference to the flow chart of FIGURE 61 and the resistive 
scanning block diagram of FIGURE 6N. At initialization, the 
registers are loaded and the controlling state machine 622 
starts. At Step 601, the X-axis is scanned to detect a 
25 touch (for example, see FIGURE 6E) . The relative X and Y 

axis are defined in software. This is followed by the 
discharge of all A/D input lines at Step 602 (for example, 
using the configuration of FIGURE 6H) . At Step 603, a 
voltage is applied to the X-axis. For the exemplary 8-wire 
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touchscreen, Vdd is asserted at the V+ terminal; ground at 
V- and the SX+ and SX- terminals set to the A/D reference 
voltage. A delay is inserted at Step 604 for settling. 
At Step 605, 4, 8, 16 or 32 samples are taken, 
5 depending on the state of the configuration registers. Each 

sample is compared with maximum and minimums set in 
registers 623 and 624 to determine the range of sample 
values (the stored maximum and minimum are adjusted was 
values fall between them during the comparison) . Then, at 

10 Step 606 , the difference between maximum and minimum values 

is taken and compared against a maximum deviation value set 
in duration register 625. If the maximum deviation is 
exceeded, the results are discarded and the procedure 
returns to Step 601 (thereby removing bad sampling points) . 

15 Otherwise a running value held in an accumulator/ shift 

register 626 is divided by the number of samples taken to 
calculate and average calculated. 

If the X interrupt flag is not sent at Step 607, then 
at Step 608 then the difference between the average value 

20 (new X) and the last valid X new value in register 627 is 

taken and compared against a stored minimum value in 
register 629. If it is below this minimum value, then the 
lines are discharged and the Y-scan starts. Otherwise, a 
comparison is made against a maximum value in register 63 0 

25 at Step 609. If the calculated value is above the stored 

maximum value, then it is assumed that the touch movement 
was too far and therefore the key press was invalid. In 
this case the X new interrupt pending flag is set at Step 
611, such that Step 608 is skipped in subsequent scans, and 
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the last valid X new value is taken as the X position value. 
Processing then returns to Step 601 for new samples. 

On the other hand, if the difference between the 
average value (new X) and last X value are below the stored 
5 maximum, then the X interrupt flag is set and the average 

value is taken as the X value at Step 610 and line discharge 
begins . 

The Y position is then identified through the execution 
of Steps 613 - 620, which are essentially the same as those 

10 discussed above with regards to the X position 

determination, the only difference being that data are now 
taken with respects to the Y axis. Additional registers 
631-633 in Figure 6N support the Y-scan operations. For 
brevity, the details of these steps will not be repeated. 

!5 At Step 621 and determination is made as to whether the 

X interrupt is pending, and when both the X and Y interrupt 
flags are set, the current stored X and Y values are taken 
as the position data and an interrupt to the host is 
generated. 

20 The interface to a 7-wire touchscreen device is shown 

in FIGURES 6 J - 6M. The 5 -wire version is similar except 
the A/D reference is generated internal to the A/D 
converter. In both embodiments, the V+ and V- are the 
static lines and the Z+/- and Z-/+ lines are used to switch 

25 between the X and Y axes. The A/D reference voltages are 

applied at sV+ and sV- . The touch detection configuration 
is shown in FIGURE 6J, while FIGURES 6K - 6M respectively 
show exemplary configurations during Y axis scan, X axis 
scan, and line discharge. 
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The touchscreen scanning circuitry advantageously can 
be disabled during lower power operation. In this case, the 
Touch Press signal is gated to the interrupt logic when the 
touch screen controller is disabled. A typical 
5 configuration for this is shown in FIGURE 60, using the 

5 -wire device as an example. 

Analog switches 602 can additionally be used to measure 
the chip battery voltage and similar inputs. An exemplary 
configuration for determining battery voltage is shown in 

10 FIGURE 6P. 

The touch controller TIC harness 635 for the preferred 
embodiment is shown in FIGURE 6Q. The test harness 
interfaces with high speed bus 102 through APB register 
interface 636. In the test mode, test input stimulus 

15 registers 637 control the input of sideband signals for 

analog to digital sample data, as well as powered-down touch 
detection and the inactive state. Interrupts, the analog 
switch control signals and the outputs of the A/D converter 
are read through the output capture register 638. 

20 A compatible interrupt controller 115 also operates off 

of peripheral bus 103 and can handle up to 64 interrupts. 
Interrupts are defined in software to generate either 
interrupt requests (IRQs) or fast interrupt requests (FIQs) 
to processor core 101. Additionally, a thirty-two level 

25 hardware priority scheme is provided for assisting IRQ 

vectoring along with two levels for FIQ vectoring. 
Additional features include the ability to change the 
polarity of the active state of input interrupts, as well as 
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the ability to selectively trigger interrupts off either 
rising or falling edges or voltage levels. 

A brief identification of the interrupt registers 
follow for reference. Initially, it should be noted that 
5 all interrupt share the same input and are then 

independently masked and mapped as IRQs or FIQs. 
Preferably, these registers are accessed using fixed offsets 
from a selected base address, determined by a decoder in bus 
bridge 113 . 

10 The Interrupt Raw Status Registers identify active 

interrupts, prior to masking, and the Interrupt Status 
Registers identify the active interrupts after masking. The 
Interrupt Enable/Enable Set Registers are used to 
selectively enable interrupts and when read, return the mask 

15 values for the various interrupt sources. The Interrupt 
Enable Clear Registers are used to clear bits in the 
Interrupt Enable Registers. The Programmed IRQ Interrupt 
register sets or clears programmed interrupts. 

The following Table summarizes the available interrupts 

2 0 in the preferred embodiment: 

TABLE 4 



Interrupt Name 


Type 


Source 


Description 


Bit 0 


Unused 


Level 
Only 


GND 


User defined 


Bit 1 


PROG INT 


Level 
Only 


Internal 


Software 

Programmed 

Interupt 
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Interrupt 


Name 


Type 


Source 


Description 


Bit 2 


COMMRX 


Level 


ARM core 


Processor debug 






Only 




Serial Port RX 










Interupt 


Bit 3 


COMMTX 


Level 


ARM core 


Processor debug 






Only 




serial Port TX 










Interrupt 


Bit 4 


INT_CT[0] 


Level 


TIMERS 


Timer 1 Interrupt 






Only 






Bit 5 


INT_CT[1] 


Level 


TIMERS 


Timer 2 Interrupt 






Only 






Bits 6-8 


INT_CT 


Level 


TIMERS 


Timers 5-3 




[4:2] 


Only 




Interrupts 


Bit 9 


INT_RTC 


Level 


RTC 


Real Time Clock 






Only 




Interrupt 


Bit 10 


UARTRXINT 


Level 


UART1 


UART1 Receive 




1 


Only 




Buffer Interrupt 


Bit 11 


UARTTXINT 


Level 


UART1 


UART1 Transmit 




1 


Only 




Buffer Interrupt 


Bit 12 


UARTRXINT 


Level 


UART2 


UART2 Receive 




2 


Only 




Buffer Interrupt 


Bit 13 


UARTTXINT 


Level 


UART2 


UART2 Transmit 




2 


Only 




Buffer Interrupt 


Bit 14 


UARTRXINT 


Level 


UART3 


UART3 Receive 




3 


Only 




Buffer Interupt 


Bit 15 


UARTTXINT 


Level 


UART3 


UART3 Transmit 




3 


Only 




Buffer Interrupt 
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; Interrupt ] Name 


Type Source [ Description 


Bit 16 

i 


j INT_KEY Level j KEY | Key Scan 
i 1 Only | ! Controller 
i i ; ; Int erupt 


Bit 17 

| 


1 INTJTOUCH Level ! TOUCH j Touch Scan 
i Only ! | Controller 

1 | 

i ! i Interupt 


Bit 1 

1 J— ' J- _L O 


INT_GRA j Level j GRAPHICS 
! Only 

! f 


Graphics 

Controller 

Interrupt 


Bit 19 


INT_CIA Level 
| Only 


PCCARD 


PCCard Interrupt 
Signal i 


Bit 20 


INT_VERT 


Level RASTER 
Only 


Vertical Start of 
Frame Counters 


28-21 


INT_DMA 
[7:0] 


Level 
Only 


DMA 


DMA channel 
Interrupts 


Bit 79 


INT_IRDA 


Level 
Only 


UART2 


IrDA combined 
Interrupt 


Bit 30 


INT_USB 


Level 
Only 


USB 


USB Host 

Controller 

Interrupt 


Bit 31 


INT_MAC 


Level 
Only 


MAC 


10/100 EtherNet 
MAC Interrupt 


Bit 35-32 


INT_1 
[3:0] 


Edge 
or 

Level 


External 


External 
Interrupts 3-0 
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Interrupt | Name 



Type , Source , Description 



Bit 3S 



Bit 37 



INT PROG 



I Edge 

i or 

i 

j Level 



RASTER ! Programmable 

j Interrupt within a 
, Raster Frame 



CLK1HZ 



Edge 
or 

Level 



RTC 



Real Time Clock 
Interrupt 



Bit 38 


V_CSYNC 


i 

Edge 
or 
| Level 


1 RASTER 

1 

! 


; Vertical Sync 

Sicrnal 


Bit 3 9 V CSYNC 
1 ~ 

! 


I Edge 
or 

Level 


RASTER 

i 


j 

1 1 


Bit 40 


INT_AC97 


Level AC97 

i 

Only 


■—i 

AC97 Port 
Interrupt 


Bit 41 


INTjSSPOR 
X 


Level 
Only 


SP10 


SP1 Port 0 Receive 
Interrupt 


Bit 42 


INTJSSPOT 
X 


Level 
Only 


SP10 


SPI Port 0 
Transmit Interrupt 


Bit 43 


INT_SSP1R 
X 


Level 
Only 


SP11 


SPI Port 1 Receive 
Interrupt 


Bit 44 


INT_SSP1T 
X 


Level 
Only 


SP11 


SPI Port 1 
Transmit Interrupt 


Bit 45 


INT_GPIO 


Level 
Only 


GPIO 


Combined GPIO 
Interrupt 
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5 



I Interrupt ; Name 


I Type ; Source Description 


! Bit 46 INT_CU 

! 


i Level CU j Customer Unit 

! i : 
Only ; Exception 

j ! Interrupt 


Bit 47 | INT MMC | Level ! MMC 

! ! i ! 

1 | Only j 


MMC Combined 
Interrupt 


| Bit 48 INT UART1 

! i 
i 


i i ■ — 

Level ! UART1 UART1 Combined 

0n ly i | Interrupt 


Bit 49 1 INT_UART2 


Level | UART2 
Only i 


UART2 Combined 
Interrupt j 


Bit 50 


INTJJART3 


Level | UART3 
Only j 


UART3 Combined 
Interrupt j 


Bit 51 


INTJSP10 


Level SP10 
Only | 


SP1 Port 0 

i 

Combined Interrupt | 


Bit 52 


INT_SP11 


Level j SPll 
Only 


SPI Port 1 
Combined Interrupt 


Bit 53 


INT_I2C 


Level 
Only 


12C 


12C Clock Input 
Interrupt 


Bit 54-63 


Unused 


Level 
Only 


GND 


Not assigned 



Each Interrupt is associated with a bit slice circuit, 
such circuit 700 shown in FIGURE 7. in this circuit, the 
POLARITY signal allows for the polarity of the active state 
of the received interrupt to be reversed. Edge detection 
circuitry 701 is included for the bit slice circuits 
corresponding the external interrupts as well as the 
interrupt issued at the vertical start of display frame. 
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The FIQ and IRQ masking bits from the corresponding masking 
registers control the combination of bit slice outputs to 
generate the FIQs and IRQs to the microprocessor. 

Block 117 includes four 16-bit and two 32-bit interval 
timers, and a 40 -bit time stamp debug timer. An exemplary 
16-bit timer 801 is shown in FIGURE 8A and includes a 16-bit 
down counter 802 and a 8-bit prescaler 803. Additionally, a 
5 -bit global prescaler is provided for the entire circuit 
block. Load register 804 is set to the initial timer value 
and maintains the reload value during periodic operation. 
Fields in control register 805 are used for enablement, mode 
selection and prescale configuration. 

FIGURE 8B depicts one of the 32-bit timers 806. This 
timer is based on a 32-bit down counter 807 and an 8-bit 
prescaler 808. The 32-bit timers also share the 5-bit 
global prescaler. Load and control registers 809 - 810, 
similar to those described above, are also included. In 
addition, the 32 -bit timers include a compare register 811 
and a comparator 812. This comparator circuitry is 
available for triggering interrupts at preselected timer 
values . 

The operation of interval timers of block 117 can be 
described in reference to the following table: 
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Eight -bit prescaling supports division by 1, 16, or 
25 6 , depending whether 0, 4, or 8 prescale stages are used. 
Moreover, the interval timers can each operate in either a 
free-running or periodic mode. In the free-running mode, 
the counters wrap around to their maximum value and continue 
counting down, after reaching zero. In the periodic mode, 
the counter reloads from the load register upon reaching 
zero and continues to decrement following reload, unless 
appropriate control bits are set, in which case the 
interrupt is continuously asserted until cleared. 

The time stamp debug timer is 40 -bit up counter clocked 
with a 1MHz clock and is used only for long-term debugging. 

FIGURE 8C is a functional block diagram of the timer 
block TIC harness 813 which operates from APB bus 103 
through register interface 814. The clock mode, reset 
status, input multiplexer configuration and clock enablement 
are effectuated through register interface 814. The test 
input stimulus register 815 is used to control counting and 
pre- scaler carry. The pre- scaler carry signal and interrupt 
values are observed in the test output capture register 816. 

System 100 includes keyboard matrix scan circuitry 118 
operating from peripheral bus 103. In the preferred 
embodiment, a key array of up to 64 keys in 8 rows and 8 
columns is supported, with any one or two keys debounced and 
decoded at one time. FIGURE 9A is a functional block 
diagram of this embodiment. An exemplary 8 row and 8 column 
keyboard is shown in FIGURE 9B for discussion purposes. 

Precounter 901, row and column counter 902 and row 
decoder 903 sequentially pull down the row the keyboard row 
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lines in order from Row 7 to Row 0. At the same time, the 
column lines Col. 0 to Col. 7 are passively pulled-up. The 
output of the column lines, are passed through pipeline 904 
and then decoded by column multiplexer 905 under the control 
of scan controls 906. Hence, when a key is depressed, the 
column line of the corresponding column is pulled low to the 
low voltage on the corresponding row line. 

Mechanical switch bounce is accounted for using 
programmable debounce counter 907. This counter is set to a 
predetermined scan count corresponding to a preselected 
number of scans during which the same key or pair of keys 
must be detected. The count is determined as a function of 
the expected switch bounce and the typical length of each 
scan. For example, if the potential which bounce is 2 0 
milliseconds and each complete scan of the keyboard takes 8 
milliseconds, then the count is set to three which allows 
approximately 24 milliseconds for the switch to settle. If 
the same key or pair of keys are not detected on successive 
scans during the count down period, then the scan count is 
reset . 

The contents of the row and column counter (i.e. the 
coordinates of the key or keys depressed) are passed through 
a pipeline 608, a set of temporary storage registers 609 and 
then set in the key register 610, where it can be read. 
When a key depression is detected, interrupt controller 611 
generates the corresponding interrupt to processor 101. In 
the preferred embodiment, interrupts are also generated when 
keys are released. The interrupt bit is latched until key 
register 610 is read. 
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Three key reset detector 612 detects depression of keys 
2, 4, and 7 in Row 0, the results of which is used by the 
watchdog subsystem to reset system 100. 

FIGURE 9C is a functional block diagram of the keyboard 
5 scan block TIC harness 913. Testing is conducted through 

registers 914 in the APB register interface. These 
registers are set to control the input multiplexers, reset 
status, clock mode and clock enables. Column line inputs, 
as well as the inactive mode, are controlled by test input 

10 stimulus registers 915. Row outputs, three key detect, back 

drive and the interrupt output are observed at the test 
output capture register 916. 

EEPROM/I2C 119 interface is shown in Figure 10A. 
According to one embodiment of system 100, interface 119 

15 supports a connection to an external EEPROM 1001 for 

inputting configuration information on system power-up. (An 
external serial EEPROM is not required for operation of 
system 100, although it may be required to meet specific 
operating system compatibility requirements) . 

20 Alternatively, this interface can also be used as a generic 

I 2 C Port. 

After a hardware reset, an on-chip state machine 
attempts to load the configuration data. If an EEPROM is 
present, the first 40 bytes returned are transferred to 10 
25 configuration registers. The EEPROM device is then 

accessible to the host processor for reading/writing via a 
control register. If an EEPROM device is not present, or if 
the header portion of the first 40 bytes is invalid, the 
configuration registers remain in their existing state. 
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As shown in FIGURE 10A, the EECLK port is used to 
provide the serial clock and the EEDAT port for serial data 
I/O. Initialization may be accomplished by a hardware 
reset. On a hardware reset, a hardware -based EEPROM 
controller: (1) enables the EEPROM interface (switches the 
mode of the EECLK pin); (2) send a dummy write to set the 
byte address to 0; (3) start a sequential read of bytes from 
EEPROM; (4) checks the signature header as loaded and 
aborts if an invalid signature is detected; and, (5) loads a 
fixed number of bytes, transferring data into destination 
configuration registers as loaded. 

The timing of the data and clock signals for the 
initialization load are generated by a hardware state 
machine. The minimum timing relationship between the clock 
and data in the preferred embodiment is shown in FIGURE 10B. 
Preferably, the state of the data line can change only when 
the clock line is low. A state change of the data line 
during the time that the clock line is high is used to 
indicate start and stop conditions. 

Writing to an external EEPROM requires support from 
processor 101 and is accomplished through a corresponding 
processor-accessible configuration interface register. 

During a typical EEPROM read access sequence, a dummy 
write operation is first performed which generates a start 
condition. This is followed by the generation of slave 
device address (including a device identifier and banks 
select bits) and a byte address of zero. The system 100 
begins the access at byte address zero and continues 
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accessing one byte at a time, until a stop condition is 
detected. 

EEPROM/I2C block 119 also includes two dedicated ports 
for Flash ROM programming voltage (FVPP) control, or 
5 alternatively, for use as general purpose input/output. 

Logically, the FVPP block circuitry and the LED block 12 8 
are identical, but reside at different base addresses. LED 
interface 128 provides a dedicated control for driving 2 LED 
indicators. The LED pins can also be used as general purpose 
10 input/output pins if LEDs are not used. 

An AC97 / Inter - IC Sound (I 2 S) interface 120 is 
provided on peripheral bus 102 in the preferred embodiment 
of system 100. A on-chip multiplexer allows the user to 
select between a connection to an external AC97 codec or an 
15 external I 2 S bus. 

In accordance with the AC97 specification, interface 
120 includes a port for receiving the AC97 bit clock 
(ABITLCK) and serial data (ASD1, ASD12) from one or two 
external AC97 codecs, as well as port for transmitting a 
20 sync signal (ASYNC) , serial data (ASDO) and a reset signal 

(ARSTn) . Generally, the external codec generates the bit 
clock ABITCLK which is then divided down by interface 120 to 
generate the sync signal ASYNC. ASYNC signals the start of 
each audio frame, with data transmitted onto the AC97 link 
25 on the rising edges of the bit clock and sampled on the 

receiving end on the falling edges of the bit clock. 

In the preferred embodiment, interface 12 0 supports a 
dual codec architecture in accordance with the AC97 
specification, Revision 2.1. A preferred dual codec serial 
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interface is shown in FIGURE 11A. Serial data is input from 
the corresponding pair of codecs through input pins ASDI and 
ASDI2 and a corresponding set of shift/data formatters 1101 
and 1102. (If only one codec is being used, the second pin 
may be used for extended GPIO functionality.) The two 
external codecs receive data through a single data output 
Port ASDO supported by shift/data formatter 1103. 

The serial interface is controlled by a set of 
registers in register files 1104. Register file 1104 
includes a set of common registers for generally setting up 
the AC- link as well as AC- link registers for setting up the 
configuration of each specific link to each of the two 
external codecs . 

Interface 12 0 employs a double buffer mechanism for 
transferring data between AC97 link and system memory. This 
arrangement includes four 32 -bit wide receive buffers 1105 
and four 32 -bit wide transmit buffers 1106, with the 
transmit buffers providing paths from system memory to the 
AC- link and the receive buffers providing paths from the 
AC- link to system memory. Each transmit and receive buffer 
is associated with a slot map register for controlling the 
exchange of data through the specified AC- link slots, as 
well as for defining the data format conversion to be used 
with the corresponding payload data. These exchanges are 
controlled either by host polling or through the DMA 
controller. In the case of polling, the host polls 
associated buffer status registers to determine whether the 
given buffers need to be filled or to be emptied through 
writes and reads. In the case of DMA operations, buffer 
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status bits in register file 1104 are routed to DMA 
controller 105, which then handles any AC- link data 
requests, following initialization by system 100. 

Shift/data formatters 1101- 1103, under the control of 
port timing and control logic 1107 and registers 1104 allow 
interface 120 to support multiple data formats. For 
example, monaural data can be handled as either 16 -bit or 
20-bit samples, which are right justified in memory. For 
16-bit samples, a four bit left shift is performed while 
routing to the AC-link slot, and for 20-bit samples, the 20 
LSBs of each 32 -bit word are passed to the AC-link slot. 
(In the preferred embodiment, data are stored in system 
memory as 32-bit words.) For stereo data, 16-bit left and 
right samples can be packed into a 32 -bit word and processed 
as a single unit. These left and right samples are unpacked 
and then left-shifted to fill 20-bit AC-link slot data 
fields. Since 20-bit data can not be packed into 32-bit 
words, stereo 20-bit data is essentially processed as two 
separate data streams. 

AC97 interface 12 0 in the preferred embodiment, 
operates across an AC-link running at a fixed frame rate of 
48 KHz. When data is being received by interface 120, slot 
valid tag bits received in slot 0 indicate which of the 
following slots contain valid samples. Thereafter, only 
samples from slots with valid slot bits are accepted into 
the receive buffers. Therefore by dynamically changing the 
valid slot bits the sample frequency of the data input from 
the AC-link can be dynamically controlled. During data 
transfers to the external codecs, an on-demand scheme is 
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typically employed. Generally, software running on 
processor core 101 sets output slot valid bits which 
indicate active slots and then, using a fixed sample rate, 
the serial port of interface 12 0 transfers samples from the 
5 transmit buffer to the valid slot on the link at the 48 KHz 

rate . 

Multiplexers 1108-1110 support and enhance the loop 
back modes available on AC97 compliant codecs. Bus-centric 
loop backs are illustrated in FIGURE 11B where the loop back 

10 begins at the transmit buffers 1106 and ends at the received 

buffers 1105. Exemplary analog- centric loop backs are shown 
in FIGURE 11C where the loop back starts and ends in the 
analog domain. Consequently, these loop backs generally 
require external analog test equipment such as an Audio 

15 Precision System 2. 

In accordance with the I 2 S specification (Philips 
Semiconductors) , the I 2 S interface of block 12 0 supports a 
digital audio link. This protocol operates on a 3 -wire 
interface which includes a serial clock line, serial data 

20 line, and word select line. The system 100 I 2 S interface 

includes both a specification compliant transmitter and 
receiver. This interface can be configured as either the 
master or slave in the context of the I2S bus specification. 
When configures as the IIS master, the interface generates 

25 the serial clock and word select signal and outputs them on 

the ABITCLK and SYNC pins respectively. Additionally, when 
configured as the master, the ARSTn pin is driven with a 
master clock signal, typically 256 times the word select 
rate. When configured as the I2S slave, the serial clock 

WSM Docket No. 2836-P101US 



Attorney Docket No. Patent 
1042- EP 

66 

and word select signal are received as inputs on the ABITCLK 
and ASYNC pins respectively. The master clock is not used 
in a slave configuration. For either master or slave 
configurations the serial data is treated the same. Output 
5 data is driven onto the ASDO pin and input data is received 

on the ASD1 pint. 

For I2S operation, the ABITCLK pin is used to output 
the serial clock SCLK, the ASYNC pin for the LRCLK, and the 
ARSTn pin for the master clock MCLK when interface 12 0 is 

10 operating as the I2S master (the MCLK is not used when 

interface 120 is operating as the I2S slave) . 

In embodiments employing an ARM92 0T processor core, a 
set of general purpose input /output ports 121 are provided. 
GPIO block 121 includes 16 individually programmable I/O 

15 pins arranged as two 8 -bit bidirectional ports. For each of 

the two ports, a data direction register and a data register 
are provided. The data direction registers allow each 
individual pin to be configured as either an input or 
output. GPIO block further includes an interface to 

2 0 peripheral bus which generates read and write control 

signals necessary to access the data. 

In addition to the standard GPIO functions, GPIO block 
121 in system 100 includes enhanced capability. In 
particular, interrupts have been added to each of the GPIO 

25 pins, along with registers for enabling and masking the 

interrupts, status and test control registers. 

SPI interface (Synchronous Serial Interface) 122 can be 
used to communicate with an external analog to digital 
converter and/or digitizer. In the illustrated embodiment 
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two SPI controllers (SPIO and SPI1) are provided which 
support the Motorola SPI format, the Texas Instruments SPI 
format, and National Semiconductor serial formats. The SPIO 
Port can be multiplexed with the AC97 pins or with the key 
5 matrix row pins . 

System 100 includes three of universal asynchronous 
receive-transmit (UART) interfaces 123 - 125. These 
asynchronous ports can be used, for example, to communicate 
with external RS-232 transceivers. Generally, UARTs 123-125 

10 operate similar to that of industry standard 16C550 UART 

devices. UARTs 123-125 are preferably slaves off of 
peripheral bus 103 and operate at baud rates up to 115.2 
Kbits/sec. In the preferred embodiment, UARTs 123-125 are 
based on ARM PrimeCall UART designs available from ARM Ltd., 

15 Cambridge, England. 

In addition to conventional receive and transmit ports, 
UART 123 (UART1) can also receive the three modem control 
signals CTS (Clear to Send) , DSR (Data Set Ready) , and DCD 
(Data Carrier Detect) (external modem hardware generates the 

2 0 associated modem control signal RTSn, DTRn, and RI) . 

Additionally, UART1 includes an HDLC transmitter which 
performs framing and bit stuffing in accordance with the 
HDLC protocol. An HDLC receiver in UART1 performs framing, 
address matching, code substitution, CRC checking, and 

25 optionally, transmission of a CRC sum at end of packet. 

UART2 (124) additionally includes an IrDA (Infrared 
Data Association) SIR protocol processing stage for driving 
an infrared light emitting diode (LED) and receiving data 
from a photodiode. 
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UART3 (123) is similar to UART1 except the modem 
control port is hardwired to a passive state. 

Real time clock (RTC) with Trim 126 allows software 
controlled digital compensation of a 32.768 KHz crystal 
5 oscillator. Advantageously, software controlled digital 

compensation allows the oscillator to be electronically 
calibrated by automatic test equipment during manufacture 
and then adjusted in the field. Specifically, an oscillator 
compensation value, including a counter preload value to act 

10 as an integer divider, and a value representing the number 

of 32.768 KHz clock periods to be deleted on a periodic 
interval, is determined in manufacturing by adjusting the 
frequency of the 1 Hz clock. The compensation value is then 
stored in flash memory When system 100 is first enabled in 

15 the field, the compensation value is retrieved from memory 

and used to control the oscillator frequency. 

Watchdog timer circuitry 129 is based on a 7-bit 
counter, the most significant bit of which is used to 
trigger the generation of a Watchdog Reset signal. In the 

20 preferred embodiment, this signal is generated as follows: 

Time -out /Duration = 64 / Watchdog Clk frequency. For a 
400Hz CLK, time-out and reset pulse duration are 64/200 = 
160 milliseconds 

To keep the reset pulse from occurring, software must 

25 "kick the dog" on a periodic basis by resetting the counter 

and preventing the MSB from activating. The counter is reset 
in the preferred embodiment by writing an Opcode into a 
corresponding watchdog control register. In the preferred 
embodiment, the watchdog must be "kicked" at least 2 clock 
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periods faster than the time-out calculation would indicate 
to allow for clock synchronization and to account for 
handshaking delays . 

Watchdog time 129 can be selectively enabled and 
5 disabled in software by writing the appropriate Opcode into 

the watchdog control register. Additionally, this block can 
be hardware disabled using an external pull down resistor at 
the CSn[l] . Moreover, the watchdog timer register can be 
read to determine the cause of a reset. In particular, the 

10 bits in this register indicate whether the reset condition 

was the result of a user reset, a three key reset, a power 
on reset, or a watchdog time-out. 

Testing of the watchdog timer 12 9 is coordinated via 
the Test Interface Controller (TIC) harness 1201 shown in 

15 FIGURE 12. Registers 1201 in the APB register interface 

communicate with the TIC via peripheral bus 103. 
Specifically, the watchdog control register is used to 
control the input multiplexer, reset status, and clock mode 
and the watchdog test clock enable register is used for 

20 generating clock enables in the register clocked test mode. 

Side band input signal values are controlled by the watchdog 
test input stimulus register 1203. The Watchdog Reset output 
signal and the watchdog counter value can be observed at the 
watchdog test output capture register 12 04. 

25 System control block 13 0 generally control such central 

functions as hardware test mode, clock control, power 
management and system configuration management . 

In addition to the JTAG testing described above, 
hardware test modes are available to provide entry into an 
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alternate system boot routine and support specialized 
testing by automatic test equipment. Among these 
specialized tests, are tests of the oscillator and PLLs, 
tests by test interface controller (TIC) of system internal 
functions through high speed bus 102, scan testing using 
Automatic Test Pattern Generation, observation testing which 
allows internal signals to be monitored through the Row and 
Column pins to keyboard interface 118, drive all float, 
drive all high and drive all low tests which cause all 
output capable pins to enter either a floating, logic high 
or logic low state, and a XOR tree test allowing all input 
capable pins to be connected to an XOR tree. 

System 100 includes two phase-locked loops (PLLs) 131 
which generate the clocks and similar timing signals 
necessary during device operation. PLLs 131 are configured 
with registers within system control clock 130. Among other 
things the multiply rate, the value which determines the 
number by which the reference clock is multiplied to produce 
the PLL output clock, is independently set for each PLL. 
Additionally, the output clock can be sent to an output pin 
for observation or a given PLL can be bypassed completely 
such that the output clock becomes the reference clock. 

For a more complete description of the preferred clock 
generation circuitry used in system 100, reference is now 
made to copending, coassigned patent application Serial 
Number (Attorney Docket Number 1044-EP [2836-P102US] ) . 

IDE interface 132 operates from high speed bus 102 and 
supports ATAPI compliant connections to both external master 
and slave IDE devices, up to PIO Mode 4, Multiword DMA Mode 
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2 (MDMA) , and the Ultra DMA (UDMA) mode 3 . In the preferred 
embodiment, IDE interface 132 uses 16-bit transfers, even 
during non-data transfers in the PIO mode when only 8 bits 
are valid. 

System 100 connects with an external ATAPI device 
through a 2 8 -pin port, one or more of these pins shared with 
the General Purpose I/O port (GPIO) . A brief description of 
the ATAPI port is provided in TABLE 6. Preferably, IDE 
Interface 132 operates asynchronously to the IDE, with all 



10 


signals synchronized to the high speed bus clock (HCLK) . 

TABLE 6 




IDE Pin 


No. 

Pin 
s 


i Description 




CS0_n 


1 


chip select for registers with base 
address lfOh 




CSl_n 


1 


chip select for registers with base 
address 3f0h 


15 


DA [2:0] 


3 


— — — 

3 -bit binary encoded address 




DI0R_n/ 
HDMARD Y_n / 
HSTROBE 


1 


strobe signal to read device regs or 
data port/ 

flow control signal for Ultra DMA 
data-in burst/ 

flow control signal for Ultra DMA 
data-out burst 


20 


DI0W_n/ 
STOP 


1 


strobe signal to write device regs or 
data port/ 

terminates an Ultra DMA burst 
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5 



10 



j IDE Pin i No . 

I 
t 

I Pin 

1 s 


| Description 

i 


DMAKC_n 1 | DMA acknowledge to DMARQ to initiate DMA 
1 1 transfers 


DASP_n 


i 

! 1 signal to indicate that a device is 
active, or that Device 1 is present 


DMARQ 


1 DMA request for DMA to and from the 
1 controller 


i i 

INTRQ ; 1 j device interrupt 


IORDY/ 

DDMARDY_n/ 

DSTROBE 


1 


negate to extend the host transfer cycle 

j of any host read or write access/ 

i 

flow control signal for Ultra DMA 
data-out burst/ 

flow control for Ultra DMA data-in burst 


I0CS16_n 


1 


device indicates it supports 16-bit I/O 
bus cycles 


PDIAG_n/ 
CBLID_n 


1 


asserted by device 1 to indicate to 
device 0 that it has finished 
diagnostic/ 

cable assembly type identifier 


DD[15:0] 


16 


16-bit interface between controller and 
device 



In the PIO mode, a Pin Interface Unit handles all 
operations. An IDE host uses the PIO mode for non-data and 
data transfers in either direction. 
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For the DMA modes, data transfers are preferably made 
through one of the DMACRC controllers discussed above with 
respects to DMA engine 105. Moreover, both the MDMA and 
UDMA modes are set-up by the host using PIO operations. 
5 Generally, the DMACRC controller performs a DMA data 

transfer by: (i) requesting the AHB bus; (ii) reading the 
source data into a local buffer; and (iii) requests a write 
to the destination via high speed bus 102. For host read 
operation, the DMA controller attempts to keep the input 

10 read buffer empty, while for a host write, it attempts to 

keep the write buffer half full. Typical data transfers are 
made to system dynamic memory and therefore are effectuated 
through the SDRAM controller. 

During MDMA operations, a pair of Dataln and DataOut 

15 buffers are used for the read and write operations, 

respectively. An MDMA state machine sets -up the necessary 
signalling, including sending the appropriate request to the 
DMA controller. In the preferred embodiment, all data 
transfers are 32 bits wide and are performed using two 

20 16-bit wide IDE interface data transfers. 

During an MDMA write, the DMACRC writes data to DataOut 
buffer and then the state machine toggles the write (DIOW) 
strobe and drives the data on to the data (DD) bus. During 
an MDMA read, the host fills the DATAIn buffer by latching 

25 data off the data bus with the read strobe (DIOR) , and then 

state machine sends a request to the DMACRC controller. The 
read completes when the DMACRC controller reads data out of 
the Dataln buffer. 
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UDMA transfers are executed through a pair of 3 2 -bit 
wide, 12 -entry deep buffers, namely, an input read buffer 
and an output write buffer. In the preferred embodiment, 
these are circular buffers set-up in memory using head and 
5 tail pointers. A UDMA state machine controls the 

signalling, including the generation of requests to the DMA 
controller. 

During a UDMA write, a DMA request is sent to fill 4 
32-bit entries in the write buffer, when the number of write 

10 buffer entries falls below 4. The UDMA state machine 

controls the handshaking with the external host device. For 
flow control, IDE interface 132 temporarily de-asserts the 
control signal DDMARDY and the host controls the toggling of 
the strobe HSTROBE . 

"15 For a read, when the read buffer has 4 or more entries 

filled, a DMA request is made to the DMACRC. Flow control 
in this case is controlled by the host by temporarily 
deasserting DDMARDY and by Interface by controlling the 
toggling of the signal DSTROBE . The handshaking is again 

20 controlled by the UDMA state machine. 

In the preferred embodiment, data transfers are 
performed using a "ping-pong" scheme, and a "grace" buffer 
area is provided to account for instances where the 
handshakes for pausing come at a rate lower than that at 

25 which data are transferred* 

The UDMA state machine also handles transfer 
terminations, which can be initiated by either system 100 or 
the associated ATAPI device coupled to system 100. 
Whichever device terminates the transaction, the other 
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device honors the termination request and stops the 
transfer. Additionally, for both reads and writes, a 16 -bit 
CRC result is sent to the host for checking. The CRC 
registers are preloaded, as described above, with a value of 
5 0x4ABA at the beginning of the transfer. 

In the preferred embodiment, all blocks or subsystems 
101 - 132 of system 100 are fabricated on a single 
integrated circuit chip. This can be accomplished for 
example using a 0.25 /urn, four layer metal process, although 

10 other processes known in the art can also be used. In the 

illustrated embodiment, processor core 101 operates from a 
2.5V nominal supply, although this may be reduced in 
alternate embodiments. The peripherals in the illustrated 
embodiment operate from a 3.3V supply. In this embodiment, 

15 the nominal clock speed for processor core 101 is 200 MHz. 

FIGURE 13 is a high level functional block diagram of a 
math coprocessor 13 00 included in the preferred embodiment 
of system 100. Math coprocessor 1300 is a digital signal 
processor (DSP) which operates in conjunction with 

20 microprocessor core 101 and includes pipeline 

follower/control circuitry 1301, scoreboard 1302 and 
register file 1303. The primary data processing blocks 
include an integer/ floating point comparator (FCMP) block 
1400, shown in further detail in FIGURE 14, a floating 

25 point adder (FADD) 1500, shown in detail in FIGURE 15, and 

an integer/floating point multiplier and multiply 
accumulator with an integral adder (MMAC) 1600, shown in 
further detail in FIGURE 16 . 



WSM Docket No. 2836-P101US 



Attorney Docket No. Patent 
1042- EP 

76 

Comparator 1400, FADD 1500; and MMAC 1600 are pipelined 
devices which operate in five stages (namely Decode and 
Operand Fetch, Execute Stages 1-3, and writeback) . 
Register file 1303 and pipeline follower 1301 are clocked 
5 directly by the processor 101 FCLK ("fast clock"), while 

adder 1500, comparator 1400, and multiplier 1600 operate 
synchronously with processor 101 but at one-half the FCLK 
frequency. Consequently, loads and stores between the 
microprocessor registers, the memory interfaces and the math 

10 coprocessor registers run at the full FCLK rate, but math 

coprocessor computations run at half the FCLK rate (0PCLK) . 
In the illustrated embodiment, the five stage DSP pipeline 
is not visible to the programmer since the register file is 
fully scoreboarded and the pipeline is interlocked; 

15 forwarding between pipelines stages is supported to avoid 

bubbles in the pipeline that would otherwise form when the 
result of an instruction must be written back to the 
register file before that result can be used by the next 
instruction. 

2 0 In the following discussion of the preferred 

embodiment, the following data types will considered the 
minimum set upon which coprocessor 1300 is able to operate: 



TABLE 7 







Number of bits in . . . 


Short 


Data Type 








Hand 






Signed 


Biase 


Name 




Registe 


Signif 


d 






r File 


icand 


Exp on 
ent 
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f32 


Single 

precision 

float 


32 


i 24 

i 


8 


J_ D *± 


jjuuij-l e 

precision 

float 


64 


; 53 


J- 1 


acc 


72-bit 
extended 
precision 
irt 


| 61 


11 


i32 


32-bit 
integer 


32 


j 32 
| 




i64 


64-bit 
integer 


64 


! 64 
| 





Additionally, the cycle counts and latencies for each 
type of data through the multiplier and adder operations 
illustrated in TABLES 8 and 9 respectively. Note that 
5 single precision floating point and 32 -bit integer 

multiplication produce one result every clock cycle, while 
double precision floating point and 64 -bit integer 
multiplication produce one result every four clock cycles* 



TABLE 8 





Cycle Count /Latency Through Pipeline 


f32 


£64 


acc 


i32 


164 


f32 


1/5 










f 64 




4/8 








acc 












i32 








1/5 
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i64 



4/8 



TABLE 9 





Cycle Count /Latency Through Pipeline 




f32 




f64 


acc 


i32 j 


i64 


f32 


1/5 






1/5 ! 


f 64 






1/5 


1/5 i 


acc 


1/5 




1/5 


1/5 


1/5 


1/5 


i32 




1/5 


1/5 




i64 ! 




1/5 




1/4 



10 



15 



20 



The coprocessor register set preferably consists of 16 
64 -bit general purpose registers and four 72 -bit 
accumulators. For the purposes of instruction encoding, the 
names of the 16 physical general purpose registers vary 
according to the data type stored in them, as illustrated in 
Table 10. 

TABLE 10 



Register Name 


Data Type 


F[15:0] 


Single precision floating point 


D[15:0] 


Double precision floating point 


FX [15:0] 


32 -bit integer 


DX[15:0] 


64-bit integer 


AX[3:0] 


72 -bit Accumulator Contents 



A single precision floating point number is stored in 
the upper half of a 64 -bit physical register; single 
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precision numbers must be explicitly promoted to double 
precision before being used in double precision 
calculations. A 32 -bit integer is stored in lower half of a 
64 -bit physical register and sign extended; 32 -bit integers 
5 can therefore be used directly in 64 -bit integer 

calculations . 

The last coprocessor register is the status/control 
register. The bit description for this register is provided 
in Appendix A. 

10 In the preferred embodiment based on an ARM V4T 

processor core architecture, five coprocessor instructions 
are defined: CDP (Coprocessor Data Processing) , LDC (Load 
Coprocessor) , STC (Store Coprocessor) , MCR (Move to 
Coprocessor from ARM Register) , and MRC (Move to ARM 

15 Register from Coprocessor) . The formats for these five 

instructions are given in Tables 11 to 15 . 



TABLE 11 



31:2 


27:2 


23 : 


21:20 


19: 


15: 


11:8 


7:5 


4 


3:0 


8 


4 


22 




16 


12 










cond 


1 1 


rsv 


o 


CRn 


CRd 


cp_nu 


op cod 


0 


CRM 




1 0 


d 


pcodel 






m 


e2 







TABLE 12 



31:2 


27 :2 


2 


2 


2 


2 


2 


19: 


15: 


11 : 8 


7:0 


8 


5 


4 


3 


2 


1 


0 


16 


12 






cond 


1 1 


P 


U 


N 


W 


1 


Rn 


CRd 


cp_nu 


8-bit_word_ 




0 
















m 


offset 
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TABLE 13 



31:2 


27:2 


2 


2 


2 


2 


2 


19 : 


15: 


11:8 


7:0 


8 


5 


4 


3 


2 


1 


0 


16 


12 






cond 


1 1 


P 


U 


N 


W 


0 


Rn 


; CRd 


| cp_nu 


8-bit_word_ 




o 
















i m 


offset 



5 TABLE 14 



31: 


27:2 


23 : 


21 | 


2 


19 : 


15 : 


11:8 


7:5 


4 


: 3:0 


28 


4 


22 


] 
i 


0 


16 


12 










con 


1 1 


rsv 


opcod 1 


0 


CRn 


CRd 


cp_nu 


opcod 


1 


CRM 


d 


1 0 


d 


el 








m 


e2 






TABLE 15 


31: 


: 27:2 


23 : 


21 


2 


19 : 


15 : 


11 : 8 


7:5 


4 


3:0 


28 


i 4 

i 


22 




0 


16 


12 










con 


1 1 


rsv 


opcod 


1 


CRn 


CRd 


cp_nu 


opcod 


1 


CRM 


d 


1 0 


i d 


el 








m 


e2 







15 Bits 31:28 of each instruction are the standard ARM 

condition codes; their interpretation is provided in Table 
16 . Note that the status flags referenced by the condition 
codes are condition code flags (the upper four bits of a 
program status register) of microprocessor 101. 

2 0 TABLE 16 



Opcode [3 


Mnemonic 


Meaning 


Status Flag State 


1:28] 


Extensio 








n 
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! 0000 

i 


EQ 


Equal 


Z set 


0001 


NE 


Not Equal 


i 

; Z clear 


! 001 0 

1 


1 fc /tig 
/ no 


! Larr y 

, Set/Unsiqned 
Higher or Same 




■ 0011 


CC/LO 


Carry 

Clear/Unsigned 
Lower 


; C clear 

i 
| 


0100 


MI 


Minus /Negat ive 


! 

I N set 

i 


0101 


PL 


Plus/Positive or 
Zero 


N clear 


0110 


VS 


Overflow 


V set 


0111 


VC 


No Overflow 


V clear 


1000 


HI 


Unsigned Higher 


C set and Z clear 


1001 


LS 


Unsigned Lower or 
Same 


C clear or Z set 


1U1U 




oj_yrieci ^ireaxe-L 
Than or Equal 


in set ana v set, 
or N clear and V 
clear (N = V) 


1011 


LT 


Signed Less Than 


N set and V clear, 
or N clear and V 

Set. \vi 1= V; 


1100 


GT 


Signed Greater 
Than 


Z clear, and 
either N set and V 
set, or N clear 
and V clear (Z = 
0, N = V) 
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; 1101 


; LE 

i 


Sicrnpd. Irpss Than % ^pf or N <=ipf 

; i 


! 

! 




i 


or Equal ! and V clear, and V 


1 




i 


I | set (Z-l, N ! =V) 


1110 


AL 


I Always 


! 




j 


(unconditional) 


| 1111 


! NV 


i ! 

| Never 


The 


other 


bits 


in the instruction formats shown above are 


interpreted as 


follows : 


(1) 


opcodel : 


DSP coprocessor-defined opcode; 


(2) 


opcode2 : 


DSP coprocessor-defined opcode; 


(3) 


CRn: 


DSP 


coprocessor-defined register ID; 


(4) 


CRd: 


DSP 


coprocessor-def ined register ID; 


(5) 


CRm: 


DSP 


coprocessor-defined register ID; 


(6) 


Rn: 


Specifies an ARM base address register. These 




bits 


are 


ignored by the DSP coprocessor ,- 


(7) 


Rd: 


Specifies a source or destination ARM register. 




Some 


DSP 


coprocessor instructions interpret these bits 




as a 


coprocessor-defined register ID; most instructions 




ignore these bits; 



(8) cp_num: Coprocessor number; 

(9) P: Pre- indexing (P=l) or post -indexing (P=0) 
addressing. This bit is ignored by the DSP 
coprocessor; 

(10) U: Specifies whether the supplied 8-bit offset is 
added to a base register (U=l) or subtracted from a 
base register (U=0) . This bit is ignored by the DSP 
coprocessor; 
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(11) N: Specifies the width of a data type involved in a 
move operation. [The DSP coprocessor uses this bit to 
distinguish between single precision floating 

point /32 -bit integer numbers (N=0) and double precision 
5 floating point/64-bit integer numbers (N=l) ] ; 

(12) W: Specifies whether or not a calculated address will 
be written back to a base register (W=l) or not (W=0) . 
This bit is ignored by the DSP coprocessor; and 

(13) 8-bit word offset: An offset used in address 

10 calculations. These bits are ignored by the DSP 

coprocessor. 

A preferred instruction set for math coprocessor 1300 
is provided in Appendix B. 

To illustrate the floating point operation of math 
15 processor 1300, reference is now made to the flow chart of 

FIGURE 17 and the schematics of FIGURES 14-16. Integer 
operations will be discussed further in conjunction with 
FIGURE 18. Generally, operations proceed through the five 
stages as follows: 

20 (1) During the Decode and Fetch Operands stage the 

current coprocessor instruction is decoded and the 
source operands are fetched; 

(2) During Execute Stage 1 a compare instruction 
executes in FCMP 1400, multiplication begins in MMAC 
25 1600 for a multiplication instruction, and exponent 
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comparison and alignment begins in FADD 1500 for an 
addition ( subtraction) instruction ; 



(3) During Execute Stage 2, mantissa multiplication 
and integer addition completes in MMAC 1600 for a 
5 multiplication instruction, and addition and leading 

0/1 detection completes in FADD 1500 for an addition 
(subtraction) instruction; 



(4) During Execute Stage 3, normalization and 
rounding completes for floating point numbers in MMAC 
10 or FADD . Saturation completes for integers in MMAC; 

and 



(5) During the Writeback stage, results are written 
back to register file 1303. 

15 In the example shown in FIGURE 17, the Instruction 

Decode and Operands fetch stage occurs at Step 1701 where 
the current instruction is decoded and operands are loaded 
into the source registers. 

The MMAC, FCMP, and FADD datapaths have common source 

20 operands but distinct source registers: FCMP 1400 and FADD 

1500 (source registers) are associated with the AsrcO and 
BsrcO source registers (the "A and B n source registers) 
while MMAC is associated with the XsrcO, YsrcO, BsrcO, and 
CsrcO source registers (the "X, Y, B and C" source 

25 registers) . All of the source registers except for 

registers BsrcO and CsrcO are 78 bits wide and contain the 
following fields: 
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Nemonic 


Description 




dblMant 


double precision multiply flag (1 bit) 




dblExp 


double precision exponent flag (1 bit) 


5 


sign 


floating point sign (1 bit) 




exp 


floating point exponent (11 bits) 




niant 


floating point mantissa or integer (64 
bits) 



MMAC's BsrcO and CsrcO source registers are used only 
for integer calculations and are 64 bits wide. 

The following pseudo-code describes how the 78-bit 
source registers are loaded from register file data at Step 
1701. Note that operands forwarded between data paths will 
already be in 78-bit format. 
dblMant <- double 
dblExp <- double 
sign <- -integer AND bit [63] 
exp <- -integer AND (bit [62] & (double ? 
bit [61:59] : -bit [62] * 3) & (double ? bit [58:52] 
: bit [61:55])) OR integer AND ( ) 
mant <- -integer AND ("01" & (double ? bit [51:0] 
: (bit [54:32] & zeros(29))) & zeros(lO)) OR 
integer AND (bit [63:32] & (double ? bit[31:0] : 
zeros (32))) 

Where: is the bitwise complement operator, is a bit 

25 string pasting operator, "?:" is the C language ternary 
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operator (used to specify a mux), "*n n specifies a bit 
string created by repeating the previous bit n times, and 
"zeros (n) n specifies a bit string consisting of n zeros. 
Assume that the instruction calls for a compare 
5 operation at Step 1704. In other words, neither a MMAC 

operation is required at Step 1702 nor a floating point 
addition (subtraction) operation at Step 1703. 

In the case of a compare instruction, the source 
registers AsrcO and BsrcO are loaded either from operands 

10 from register file 13 03 or operands forwarded from either 

floating point adder 1500 or MMAC 1600. Comparison 
operations take place during Execute Stage 1. In this 
example, at Step 1705 64-bit comparator 1401 (FIGURE 14) 
compares the contents of the A and B source registers in a 

15 single clock-cycle. The complementary value of either or 

both of the operands can also be taken prior to the 
comparison. At Step 1706, corresponding flag to 
microprocessor core 101 is set in register 1402. 

Now consider the case where the decoded instruction 

20 calls for a floating point addition operation at Step 1703. 

In this case, the source A and source B registers are loaded 
with operands from either register file 1303 or forwarded 
from MMAC 1600. During Execution Stage 1, the exponents 
from the source A and source B entries are compared by 

25 comparison circuitry 1501 (FIGURE 15) and common exponent 

for the addition taken, which is preferably the larger of 
the two and is associated with the "larger 11 mantissa (Step 
1707) . At Step 1708, the mantissa of the floating point 
operand having the smaller exponent is realigned by a right 
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shift in alignment circuitry 1502 resulting in the "aligned 
mantissa 1 ' . Additionally, the negative sign from the sign 
bits from the A and B source registers is calculated. 

The exponent detectors in MMAC and FADD contain the 
logic represented by the following pseudo-code: 

expEQden <- ~ (exp [10] OR (dblExp AND (exp[9] or 
exp [8] or exp[7])) OR exp [6] OR exp[5] OR exp[4] 
OR exp [3] or exp [2] or exp [1] OR exp[0]) 

expEQinf <- (exp [10] AND (-dblExp OR (exp [9] AND 
exp [8] AND exp [7])) AND exp [6] AND exp [5] AND 
exp [4] AND exp [3] AND exp [2] AND exp[l] AND 
exp[0] ) 

This logic signals a floating point zero as a denormal; the 
only way to account for this is to add a 52 -bit detector to 
MMAC and FADD. 

The transaction now enters Execution Stage 2 . At Step 
1709, the "larger" mantissa plus one least significant bit 
and the sign extended aligned mantissa, along with the 
appropriate sign bit, are added by 55-bit adder 1503. At 
Step 1710, the two f s compliment is taken by circuitry 1504 
and a leading edge 1/0 detection is made by detector 1505. 
The sign for the transaction is also corrected for overflow. 

During Execute Stage 3, the mantissa is normalized in 
circuitry 1506 and the exponent corrected by circuitry 1507, 
based on the result of the leading edge detection, at Step 
1711. At Step 1712, the mantissa is rounded in circuitry 
1508 and the exponent modified by ±1. At Step 1713, the 
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rounded mantissa is re-normalized by shifter 1510. The 
corrected exponent, mantissa and transaction sign are then 
concatenated at Step 1714. The result is forwarded to MMAC 
1600, floating point comparator 1400 and/ or onto register 
5 file 1303 (during the write-back stage) at Step 1715. 

Next, assume that the decoded instruction results in a 
MMAC operation at Step 1702 and that that operation is a 
floating point multiplication at Step 1716. During the 
decode and fetch operand stage, the X- source (XsrcO) and 

10 Y-source (YsrcO) registers are loaded either from the 

register file 1303 or from floating point adder 1500 in 
accordance with the logic described above in conjunction 
with the addition operation. 

During Execute Stage 1, an exclusive-OR operation of 

15 the X and Y sign bits is performed by gate 1601 and the X 

and Y exponents summed by adder 1602 at Step 1717. 
Initially, the multiplexer is set at Step 1718 such that 
Ppart =0. Additionally, the multiplication of the mantissa's 
in 32 -bit by 32 -bit two ! s compliment multiplier array 1603 

2 0 begins during Execute Stage 1. 

MMAC 1600 can perform either a single precision 
multiplication of two 32- bit floating point numbers or a 
double precision multiplication of two 64-bit floating point 
numbers (Step 1719) Consider first the case of a single 

25 precision multiplication. In this case, the signed 32-bit 

X- and Y- mantissas are multiplied in 32-bit by 32-bit two's 
compliment multiplier array 1603 at Step 1720. 
Subsequently, during Execute Stage 2, the partial sum and 
partial carry from multiplier array 1603 along with the 
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contents of the register Ppartl (currently set to zero) are 
added by a 72-bit fixed point adder 1604 (Step 1721) . The 
results of the addition can be shifted to the right by a 
72-bit shift register 1605 at Step 1722. At Step 1723, the 
5 shifted result is rounded in rounding circuitry 1606 , the 

exponent adjusted by ±1 by circuitry 1607 as a function of 
the rounding operation and the mantissa re -normalized by 
shifting in circuitry 1608 and saturated in circuitry 1609. 
At Step 1724, the exponent sign and mantissa are 

10 concatenated and the result is forwarded to FCMP 1400, FADD 

1500, and/or register file 1303 at Step 1725. 

Now consider the case where a double precision 
multiplication is required at Step 1719. The double 
precision multiplication process requires five clock cycles. 

15 First, at Step 1726, the unsigned lower 32 -bits of the 

mantissas in the X- and Y-source registers are multiplied in 
array 1603. This step is preferably performed using 
multiplexers at the multiplier array inputs. The output of 
the array, the associated carry bit and the contents of the 

20 register Ppartl (which is zero for the first clock cycle) 

are added by fixed-point adder 1604 at Step 1727. 

At Step 1728, the output of adder 1604 , shifted right 
by 32 -bits, sign extended, are selected as the new value for 
Ppartl. Then, at Step 1729, the unsigned lower 32 -mantissa 

25 bits from the X- source register are multiplied with the 

unsigned upper 32-mantissa bits from the Y-source register. 
The output from the multiplier array, including the carry 
bit are added to the contents of the Ppartl register at Step 
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1730. At Step 1731, the 72-bit output from adder 1604 is 
selected as the new contents of Ppartl. 

The next multiplication, at Step 1732, is performed 
between the signed upper 32 -mantissa bits in the X-source 
5 register and the unsigned lower 32 -mantissa bits in the 

Y- source register. The result of the multiplication, 
including the carry bit, are then added to Ppartl at Step 
1733. The new value for Ppartl is selected at Step 1734 to 
be the sum output from adderl604 shifted right by 32 -bits. 

10 The final multiplication in the multiplication array takes 

place at Step 1735 where the signed upper 32 -mantissa bits 
from the X-source register and the signed upper 3 2 -mantissa 
bits from the Y- source register are multiplied. The results 
of the multiplication, including the carry bit are then 

15 summed at 1736. 

The double precision application procedure continues as 
was done with the single precision procedure with Steps 
1722-1725 where the adder output is selectively shifted, 
rounded, saturated and re-normalized and then forwarded for 

2 0 additional operations in the floating point adder or 

floating point comparator or onto the register file. 

The floating point unit also executes instructions for 
determining the absolute value of a floating point operand, 
negating a floating point operand, converting an integer 

25 into floating point form, and converting a double precision 

operand to a single precision operand. These operations are 
performed as follows in the preferred embodiment. 

The floating point absolute value operation decision is 
made at Step 1737. The corresponding signalling NaN (Not a 
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Number) is input at Step 1738. Then, at Steps 1739 and 
1740, the invalid flag is set and the sign bit is set to 
zero. The procedure next jumps to Step 1725 where the 
mantissa and exponent are concatenated with the new sign. 
5 For a floating point negate operation, at Step 1741, 

the signalling NaN is input at Step 1742. The invalid flag 
is set at Step 1743 and the sign bit is inverted at Step 
1744. This procedure then also jumps to Step 1725. 

To convert an integer to a floating point value at Step 

10 1745, a determination is first made as to whether the 

operand is a 32-bit or 64-bit integer (Step 1746) . In the 
case of a 32 -bit integer, the operand is sign extended at 
Step 1747 to 64 bits. The initial biased exponent is set at 
Step 1748 to 1084. At Step 1749 the first operand (Opl) 

15 presented to the adder is taken as the 64 -bit value and the 

second adder operand (Opt2) is taken as zero. The procedure 
jumps to Step 1710 and these two operands are added as was 
described above for the floating point addition operation. 

2 0 At Step 1751, the execution of the double precision to 

single precision operation is illustrated. First a 
determination is made as to whether the mantissa is too 
large or too small, and if so the corresponding flag is set 
(Step 1752) . The input operand is rounded at Step 1752 to 

25 single precision. The process again jumps to Step 1735. 

Finally, if at Step 1751 the decoded instruction does 
not invoke s double to single precision conversion, then at 
Step 1754 a value of 896 is added to the exponent and at 
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Step 1755 the 29 least significant bits of the mantissa are 
set to zero and the new value forwarded. 

FIGURE 18 is a flow chart describing exemplary integer 
operations in MMAC 1600. Assume first that at Step 1801, 
5 the decoded instruction calls for an addition or subtraction 

operation. In the illustrated embodiment, MMAC 1600 can 
perform either 64-bit double precision or 32 -bit single 
precision arithmetic operations. For a 32-bit operation at 
Step 1802, then at Step 1803 the 32-bit sign extended 

10 integers from the B and C source registers (CsrcO and BsrcO) 

are multiplexed to the inputs of 72 -bit fixed point adder 
1604. For a 64-bit addition or subtraction, the 64-bit 
contents of both the B and C source registers are switched 
to the inputs of adder 1604. The addition or subtraction 

15 operation takes place at Step 1806. 

The immediate result from adder 1604 can then be 
shifted left or right by shifter 1605 at Step 1807. The 
result of the addition (subtraction) can be saturated and 
rounded at Step 1808 and the result forwarded at Step 1809 

2 0 to either of the floating point adder, floating point 

comparator, and/or the register file. 

Next, consider the case where the decoded instruction 
calls for multiplication at Step 1801. With respect to 
integer multiplications, MMAC 1600 can operate on either 

25 32-bit or 64-bit data. Assume first that 32-bit operands 
are being processed at Step 1810. 

The multiplexers are set at Step 1810 such that the 
register Ppart is loaded with zero's. Then, at Step 1812, 
the signed 32-bit X- and Y-integers are multiplied in two's 
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compliment multiplier array 1603. Fixed point adder 1604 
then adds the sum and the carry bits from multiplier array 
1603 and the contents of register Ppart at Step 1813. 

In the preferred instruction set, provided herein as 
5 Appendix B, additional operations can be performed on 3 2 -bit 

operands during the same instructions cycle. Among the 
instructions provided are 32 -bit integer multiply-add, 
32-bit integer multiply-subtract , 32-bit integer 
multiply-add, result to accumulator, 32 -bit integer 
10 multiply-subtract, result to accumulator, 32 -bit integer 

multiply-add to accumulator, and 32 -bit integer 
Q multiply-subtract from accumulator instructions. These 

operations are represented in FIGURE 18 by Steps 1814-1817. 
C At Steps 1814 and 1815, a 32-bit addition or 

m 15 subtraction takes place. The source for the add in register 

can be one of the accumulators 1610 (FIGURE 16) , or one of 
7 the C or B source registers. For instructions requiring 

y storage in the accumulator at Stepl816, the accumulation 

p takes place at 1817. Thereafter, the procedure can jump to 

;^ 20 Steps 1807-1809 where the result can be selectively shifted, 

p and /or saturated and rounded and then forwarded to the 

register file or to another functional unit within the math 
coprocessor. 

In the case of a 64-bit integer multiplication at Step 
25 1810, the register Ppart is loaded with zero's (Step 1818). 

At Step 1819, the unsigned lower 32 -bits of the X- and 
Y- integers are multiplied in the multiplier array and then 
the result of Psuml, Pcarryl are added with the contents of 
Ppart at Step 1820. Next, the output of adder 1604 is 
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shifted right by 32 -bits and becomes the new value in 
register Ppart (Step 1821) . 

At Step 1822, the lower unsigned 32 integer bits from 
the X- source register and the unsigned upper 32 integer bits 
5 from the Y-source register are multiplied in the multiplier 

array. The result Psuml, along with the carry bit and the 
contents of the Ppartl register are then added in the fixed 
point adder at Step 1823. At Step 1824 , the new value 
loaded into register Ppartl is the adder output. 

10 Next, the signed upper 32 -bits from the X- source 

register and the unsigned lower 32 -bits from the Y-source 
register are multiplied at Step 1825. The partial sum and 
carry bit are added along with the contents of Ppartl at 
Step 1826. At Step 1827, the output from the adder is 

15 shifted right by 32 -bits and becomes the new value stored in 

register Ppartl. 

Finally, the signed upper 32 -bits from the X-source 
register and the signed upper 32 -bits from the Y-source 
register are multiplied at Step 1828. The results Psuml and 

2 0 Pcarryl, are added to the contents of register Ppartl at 
Step 1829. The results can then be processed through 
Execute State 3 (i.e., Steps 1807-1809). 

In the preferred embodiment, where processor core 101 
is base on an ARM 92 0T device, the assembly language 

25 programming of the math coprocessor (DSP) is accomplished 

via macro pseudo- instructions that wrap the underlying ARM 
coprocessor load/store and execute instructions. These 
macros are supported by the ARM SDT 2.50 assembler, the GNU 
tool set's gas assembler, and the Microsof t/BSquare Windows 
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CE assembler. C/C++ compiler support for the DSP is used 
for the floating point subset of the DSP instruction set and 
access to the integer MAC unit will be provided via 
C- callable assembly language. One C/C++ compiler which 
5 preferably supports the math coprocessor the GNUPro/EGCS/gcc 

compiler from Cygnus Solutions. 

Although the invention has been described with 
reference to a specific embodiments, these descriptions are 
not meant to be construed in a limiting sense. Various 

10 modifications of the disclosed embodiments, as well as 

alternative embodiments of the invention will become 
apparent to persons skilled in the art upon reference to the 
description of the invention. It should be appreciated by 
those skilled in the art that the conception and the 

15 specific embodiment disclosed may be readily utilized as a 

basis for modifying or designing other structures for 
carrying out the same purposes of the present invention. It 
should also be realized by those skilled in the art that 
such equivalent constructions do not depart from the spirit 

2 0 and scope of the invention as set forth in the appended 

claims . 

It is therefore, contemplated that the claims will 
cover any such modifications or embodiments that fall within 
the true scope of the invention. 
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WHAT IS CLAIMED: 

1 1. A mathematics coprocessor comprising: 

2 a multiplier - accumulator unit comprising: 

3 a multiplier array for selectively multiplying 

4 first and second operands, the first and second 

5 operands having a data type selected from the group 

6 including floating point and integer data types; 

7 an adder for selectively performing addition and 

8 subtraction operations on third and fourth operands; 

9 and 

10 multiplexer circuitry for selectively presenting 

11 the third and fourth operands to inputs of the adder, 

12 the multiplexer circuitry selecting the third and 
!3 fourth operands from the contents of a set of 

14 associated source registers, data output from the 

15 multiplier array and data output from the adder. 

1 2. The coprocessor of Claim 1 wherein the third and fourth 

2 operands comprise integers. 
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1 3, The coprocessor of Claim 1 wherein said multiply - 

2 accumulate unit is operable during a double precision 

3 multiplication to: 

4 multiply an unsigned first set of bits from a first 

5 source register with an unsigned first set of bits from a 

6 second source register to generate a first product and first 

7 carry bit; 

8 add the first product and first carry bit with a first 

9 constant to generate a first sum; 

10 multiply the unsigned first set of bits from the first 

11 source register with an unsigned second set of bits from the 

12 second source register to generate a second product and 

13 second carry bit; 

14 add the second product and carry bit with the first sum 

15 to generate a second sum; 

16 multiply a signed second set of bits from the first 

17 source register with the unsigned first set of bits from the 

18 second register to generate a third product and carry bit; 

19 add the second sum with the third product and carry bit 

20 to generate a third sum; 

21 multiply the signed second set of bits from the first 

22 source register with the signed second set of bits from the 

23 second source register to generate a fourth product; and 

24 add the fourth product with the third sum, the third 

25 sum being selectively shifted, to generate a product of the 
2 6 contents of the first and second source registers. 
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1 4 . The coprocessor of Claim 3 wherein the first set of 

2 bits from the first and second source registers each 

3 comprise an upper set of bits from integers stored in the 

4 first and second registers. 

1 5. The coprocessor of Claim 3 wherein the first set of 

2 bits from the first and second source registers comprise an 

3 upper set of bits from mantissas stored in the first and 

4 second registers. 

1 6 . The coprocessor of Claim 3 wherein the second set of 

2 bits from the first and second source registers comprise a 

3 lower set of bits from integers stored in the first and 

4 second source registers. 

1 7. The coprocessor of Claim 3 wherein the second set of 

2 bits from the first and second source registers comprise a 

3 lower set of bits from mantissas stored in the first and 

4 second source registers. 

1 8. The coprocessor of Claim 1 and further comprising a 

2 floating point comparator for selectively comparing operands 

3 presented in a set of source registers. 
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1 9. The coprocessor of Claim 1 and further comprising a 

2 floating point adder for performing floating point addition 

3 and subtraction operations on operands presented in a set of 

4 source registers. 
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10. A digital signal processor comprising: 

a multiplier-accumulator for performing integer and 

floating point multiplication and integer addition 

operations on operands selectively fetched into a set of 

source registers ; 

a floating point adder for performing floating point 

addition operations on operands selectively fetched into the 

set of source registers; and 

a comparator for comparing floating point operands 

selectively fetched into the set of source registers. 



11. The digital signal processor of Claim 10 wherein said 
multiplier - accumulator unit comprises: 

a multiplier array for selectively multiplying floating 
point mantissas and integers ; 

an fixed point adder for selectively performing 
addition operations on data including integers received from 
the set of source registers and products generated by the 
multiplier array; and 

an accumulator including a register for accumulating 
results generated by the fixed point adder. 



12, The digital signal processor of Claim 11 wherein said 
multiplier - accumulator further comprises a shift register 
for selectively shifting data including operands received 
from the set of source registers and results generated by 
the fixed point adder. 
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13 . The digital signal processor of Claim 11 wherein said 
digital signal processor comprises a math coprocessor 
operating in conjunction with a microprocessor. 

14. The digital signal processor of Claim 11 wherein said 
digital signal processor comprises a coprocessor operating 
in conjunction with a reduced instruction set computer. 



1 15 . The digital signal processor of Claim 11 wherein said 

2 multiplier - accumulator further comprises circuitry for 

3 selectively forwarding results directly to said floating 

4 point adder to prevent pipeline bubbles. 



1 IS. The digital signal processor of Claim 11 wherein said 

2 floating point adder comprises circuitry for selectively 

3 forwarding results directly to said multiplier - accumulator 

4 to prevent pipeline bubbles. 
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1 17. The digital signal processor of Claim 11 wherein said 

2 multiplier - accumulator comprises: 

3 a multiplier array for multiplying first and second 

4 operands during a first clock period; 

5 a fix point adder for adding a result from said 

6 multiplier array with a third operand during a second clock 

7 period; and 

8 an accumulator register for storing a sum output from 
said adder during the second clock period. 
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1 18. A method of performing arithmetic operations in a 

2 multiplier operable to perform both integer and floating 

3 point operations comprising the steps of: 

4 in response to a first instruction, performing a single 

5 precision multiplication of first and second signed floating 

6 point operands comprising the substeps of: 

7 adding exponents of the first and second operands; 

8 multiplying a signed mantissa of each of the 

9 operands in a multiplier array to generate a product 

10 and a carry bit; 

11 adding the partial product and carry bit with a 

12 constant using a fixed point adder to generate an 

13 intermediate result; 

14 selectively rounding and renormalizing the 

15 intermediate result; and 

16 in response to a second instruction, performing a 

17 single precision multiplication of first and second integers 

18 comprising the substeps of: 

19 multiplying the signed first and second integers 
2 0 in the multiplier array to generate a product and a 

21 carry bit; 

22 adding the product and carry bit with a constant 

23 using the fixed point adder to generate an intermediate 

24 result; and 

25 selectively rounding and renormalizing the 

26 intermediate result. 
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1 19. The method of Claim 18 and further comprising the step 

2 of performing a double precision multiplication of first and 

3 second floating point operands in response to a third 

4 instruction comprising the substeps of: 

5 adding exponents of the first and second operands; 

6 multiplying unsigned lower bits of a mantissa of the 

7 first operand with unsigned lower bits of a mantissa of the 

8 second operand in the multiplier array to generate a first 

9 partial product and a carry bit; 

10 adding the first partial product and carry bit with a 

11 constant using the fixed point adder to generate first 

12 intermediate result; 

13 multiplying the unsigned lower bits of the mantissa of 

14 the first operand with unsigned upper bits of the mantissa 

15 of the second operand in the multiplier array to generate a 

16 second partial product and second carry bit; 

17 selectively shifting the first intermediate result by a 

18 selected shift count; 

19 adding the second partial product and second carry bit 

20 with the shifted first intermediate result using the fixed 

21 point adder to generate a second intermediate result; 

22 multiplying signed upper bits of the mantissa of the 

23 first operand with the unsigned lower bits of the mantissa 

24 of the second operand in the multiplier array to generate a 

25 third partial product and third carry bit; 

26 adding the third partial product and third carry bit 

27 with the second intermediate result using the fixed point 

28 adder to generate a third intermediate result; 

29 multiplying the signed upper bits of the mantissa of 
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3 0 the first operand with the signed upper bits of the mantissa 

31 of the second operand in the multiplier array to generate a 

32 fourth partial product and fourth carry bit; 

33 selectively shifting the third intermediate result by a 

34 selected shift count; 

35 adding the fourth partial product and forth carry bit 

36 with the shifted third intermediate result using the fixed 

37 point adder to generate a fourth intermediate result; and 
3 8 selectively rounding and renormalizing the fourth 

39 intermediate result to generate a final product. 
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1 20. The method of Claim 18 and further comprising the step 

2 of performing a double precision multiplication on first and 

3 second signed integers comprising the substeps of: 

4 multiplying unsigned lower bits of the first and second 

5 integers in the multiplier array to generate a first partial 

6 product and first carry bit; 

7 adding the first partial product and first carry bit 

8 with a constant using the fixed point adder to generate a 

9 first intermediate result; 

10 multiplying the unsigned lower bits of the first 

11 integer and unsigned upper bits of the second integer in the 

12 multiplier array to generate a second product and second 

13 carry bit; 

14 selectively shifting the first intermediate result by a 

15 selected shift count; 

16 adding the second partial product and second carry bit 

17 with the shifted first intermediate result using the fixed 

18 point adder to generate a second intermediate result; 

19 multiplying signed upper bits of the first integer with 

20 the unsigned lower bits of the second integer in the 

21 multiplier array to generate a third partial product and 

22 third carry bit; 

23 adding the second intermediate result with the third 

24 partial produce and third carry bit using the fixed point 

25 adder to generate a third intermediate result; 

2 6 multiplying the signed upper bits of the first and 

27 second integers in the multiplier array to generate a fourth 

28 partial product and fourth carry bit; 
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29 shifting the third intermediate result by a selected 

3 0 shift count; 

31 adding the shifted third intermediate result with the 

32 fourth partial product and fourth carry bit in the fixed 
3 3 point adder to generate a fourth intermediate result; and 

34 selectively rounding and renormalizing the fourth 

35 intermediate result to generate a final product. 



1 21. The method of Claim 18 and further comprising the step 

2 of adding first and second integers in the multiplier in 

3 response to a third instruction comprising the steps of: 

4 presenting the first and second integers to 

5 corresponding inputs of the fixed point adder forming a 

6 portion of the multiplier; and 

7 adding the first and second integers with the fixed 

8 point adder. 



1 22. The method of Claim 18 wherein the multiplier further 

2 includes at least one accumulator and said step of 

3 performing a single precision integer multiplication 

4 further comprises the substeps of: 

5 adding a third integer to the intermediate result to 

6 generate a sum; 

7 storing the sum in the accumulator. 
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1 23. The method of Claim 18 wherein the multiplier further 

2 includes at least one accumulator and said step of 

3 performing a single precision integer multiplication further 

4 comprises the substeps of: 

5 subtracting a third integer from the intermediate 

6 result to generate a result; and 

7 storing the result in the accumulator. 



1 24. The method of Claim 18 wherein the multiplier further 

2 includes at least one accumulator and said step of 

3 performing a single precision integer multiplication further 

4 comprises the substeps of: 

5 adding the intermediate result to a value stored in an 

6 accumulator; and 

7 storing the result of said substep of adding in an 

8 accumulator. 



1 25. The method of Claim 18 wherein the multiplier further 

2 includes at least one accumulator and said step of 

3 performing a single precision integer multiplication further 

4 comprises the substeps of: 

5 subtracting a value stored in an accumulator from the 

6 intermediate result; and 

7 storing the result of said substep of subtracting in an 

8 accumulator. 
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1 26. An instruction set for operating a processor including 

2 a multiplier array, a fixed point adder and a floating point 

3 adder comprising : 

4 a first set of instructions for multiplying first and 

5 second operands, at least some bits of each of said first 

6 and second operands multiplied in said multiplier array and 

7 a result of the multiplication added to a third value by the 

8 fixed point adder; 

9 a second set of instructions for adding first and 

10 second integers using said fixed point adder; and 

11 a third set of instructions for adding first and second 

12 floating point values in said floating point adder. 



1 27. The instruction set of Claim 26 wherein said first set 

2 of instructions comprises: 

3 at least one instruction for multiplying first and 

4 second integer operands using said multiplier array and said 

5 fixed point adder; and 

6 at least one instruction for multiplying first and 

7 second floating point operands using said multiplier array 

8 and said fixed point adder. 
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28. The instruction set of Claim 26 wherein said first set 
of instructions comprise: 

at least one instruction for performing a double 
precision multiplication of said first and second operands; 
and 

at least one instruction for performing a single 
precision multiplication of said first and second operands. 



29. The instruction set of Claim 26 and further comprising 
a set of instructions for converting data between first and 
second data types . 



30. The instruction set of Claim 26 wherein said first data 
type comprises floating point data and said second data type 
comprises integer data. 



31. The instruction set of Claim 26 wherein said first data 
type comprises single precision data and said second data 
type comprises double precision data. 



32. The instruction set of Claim 26 and further comprising 
a set of instructions for shifting data in a selected 
direction by a selected number of bits . 
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33. The instruction set of Claim 26 and further comprising 
a set of instructions for selectively comparing first and 
second floating point numbers in a floating point comparator 
circuit . 



34. The instruction set of Claim 26 and further comprising 
a set of instructions for taking an absolute value of a 
selected operand. 



35. The instruction set of Claim 26 and further comprising 
a set of instructions for negating the value of a selected 
operand. 
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MATH COPROCESSOR 

ABSTRACT OF THE DISCLOSURE 

A math coprocessor 1300 includes a multiply- accumulate 
unit 1600. Multiplier-accumulate unit 1600 includes a 
5 multiplier array 1603 for selectively multiplying first and 

second operands, the first and second operands having a data 
type selected from the group including floating point and 
integer data types. An adder 1604 selectively performs 
addition and subtraction operations on third and fourth 
10 operands, the third and fourth operands selected by 

multiplexer circuitry from the contents of a set of 
associated source registers, data output from multiplier 
array 1603 and data output from adder 1604. 

: :ODMA\PCDOCS\DALLAS_l\32 9994 6\9 
15 233 :2836-PX01US 



WSM Docket No. 2836-P101US 



ttz 



32-bit 
LCDI/F 



I 



Raster/Graph 
Nics Engine 



109 



1 



SDRAM J/F 



SRAM/Flash/ 

ROM/ 

PCMCIA 



<«7 «v 



1/10/100 
Ethernet 



no 



JTAGI/F 
TICI/F 



/I/ -N, 



USB Host 
3 Ports 



Boot ROM 



8 Channel 

DMA 

Engine 



T 



132 



1 25 



UARTw/ 
HDLC 



123 -\ 



UARTw/ 
HDLC 



ARM920T 



I-Cache 
16KB 



D-Cache 
16KB 



WinCEMMU 



I 



113 



AHB/APB 
Bridge 



129 



WatchDog 
Tinier 



103 



'101 



12,6 



RTCwith 



in ~ 



16-bit timers 
(4) 

32-bit Timer 



UARTw/ IrDA 



PLLs (2) 



LED Drivers 

'(2) 




8x8 Key Scan 
I/F 



6-bit Contrast 
DAC 



Enhanced 
GPIO 



^ 121 



EEPROM/ 



SPII/F 



^ 122 



AC97I/F 
SPII/F 



Interrupt 
Controller 



v us 



SVSCfih) 



Tl 



30 



FIG. 1 



/ 



301 



CRCO 
(LFSR& LOGIC) 



CONFIG 



RESOURCE 



REQUEST 



DMA CHANNEL 0 



300 



CRC1 
(LFSR& LOGIC) 



CONFIG CONFIG 



DMA CHANNEL I 



*» 4 



CONFIG I I CONFIG 



CRC2 
(LFSR& LOGIC) 



DMA CHANNEL 2 



CRC3 
(LFSR& LOGIC) 



DMA CHANNEL 3 



- DMA CHANNEL 4 



- DMA CHANNEL 5 



in 



317 



303 



304 



305 



GLOBAL CONFIG^ 33 



8-WAY 
ARBITER 



--, 3ot 



TEST 
INTERFACE 



mux' 



322 
310 



0 



311 



105 



AHB 
BUS 
MASTER 
MACRO 




AHB 
REG 
SLAVE 
MACRO 



DMA Y iZ8 
CONFIG 



30* 



DMA CHANNEL 6 



DMA CHANNEL 7 



307 



. AMBA 
^AHB 



5fc 



&fa 3A 



<41.-E.fi 



M 
m 



o 

on 



If 



CO 

cs 

CO 

x 



> <J 
< < 

f/i eC 

if 



V) 



S 



o 

(4 



N 

Si 

In 



2 

H 



N 

i 

5^ 



<|i| 



2 



Z 

u 
o 



a. 



>2 

< 




3n 



2 



2 



_ 2 



% «« 2 

? D * 

2 = k 

= 3 

uj H U) 

H « OS 



8 



±\ 



CD 

3=1 
< | 

It 



03 

s s 



0 



O 
m 
m 

u 
m 
m 

O 

m 

o 

m 
o 

o 



o 



-1 



S 



x s 

* 1 



N 

c 



IE 



o 

ID 

m 

u 
m 
in 

Si 

C: 

m 
a 

a 

G 




fo «y z-ep 




« — . — » 

H17 



f 




aa — £ I 
S-QQ<£ 



Z 



s 




O 
•I 



-i 

a. 

2 

O 

y 

a* 

u 

2 

P 



V} < Sfc 




FIXED 
3 2 USEC 
DELAY 
1 1/3 IFG| 


6.4 L 
DEL 
)2/3l 


TIMER 
IPLETE 



ii 

x- 



fART DEFERRAL 
•ELDIS ISSETI 


O 
M 


2d 


Wl 


FIXED 
9.6 USEC 
IFO DELAY 





IT) 



2§ 



*2 




+ 



I o hi -bp 



RECEIVE DESCRIPTOR 
QUEUE BASE ADDRESS 
RxDBA(32) 
— » 



RECEIVER 
DESCRIPTER 
QUEUE BASE 
LENGTH 
(RxDBL) 



RxBufAdfO (32) 


NOT 
SOF 


BUFFER 
INDEX 0(15) 


BUFFER 
LENGTH 0(16) 


RxBufAdr 1 (32) 


NOT 
SOF 

(1) 


BUFFER 
INDEX (IS) 


FRAME 
LENGTH (16) 


RxBu£tdr2(32) 


NOT 
SOF 


BUFFER 
INDEX (15) 


FRAME 
LENGTH (16) 




• 






• 
• 




RxBufAdrk(32) 


NOT 
SOF 

J1L 


BUFFER 
INDEX (15) 


FRAME 
LENGTH (16) 



DATA 
BUFFER 
0 



BUFFER 

0 

LENGTH 

IN 
BYTES 



DATA 
BUFFER 

I 



BUFFER 
I 

LENGTH 

IN 
BYTES 



DATA 
BUFFER 

2 



BUFFER 
2 

LENGTH 

IN 
BYTES 



• 




• 




• 




DATA 


T 


BUFFER 




K 









BUFFER 
K 

LENGTH 

[N 
BYTES 



/ BUFFER LENGTH 

0TO64Kbyte» 
• IN MULTIPLES OF 4-BYTES 



3%p. 5E 



RECEIVE STATUS 
QUEUE BASE — 
ADDRESS (32) 
(FOR SBA) 



BITS 31-0 



C ■ CURRENT FRAME 



RECEIVE STATUS 
CURRENT — 
ADDRESS (32) 
(RxSCA) 



R 
F 
P 


STATUS (31) 


R 

F 
P 


BUFFER 
INDEX (45) 


FRAME 
LENGTH (16) 


o 

F 
P 


STATUS (31) 


R 

F 
P 


BUFFER 
INDEX (15) 


FRAME 
LENGTH (16) 




• 
• 
• 


R 
F 
P 


STATUS (31) 


F 
P 


BUFFER 
INDEX (15) 


FRAME 
LENGTH (16) 


I 
» 


STATUS (31) 


R 

• 

> 


BUFFER 
INDEX (15) 


FRAME 
LENGTH (16) 




« 
• 




> 


STATUS (31) 


R 

• 

» 


BUFFER 
INDEX (15) 


FRAME 
LENGTH (16) 



RECEIVE STATUS 
QUEUE BASE 
LENGTH (1 6) 
(FOR SBL) 



t 



+ 



d. 

DEVICE DRIVER 



PROTOCOL 

STACK 

RECVCALL 



RECEIVE RECEIVE 
DESCRIPTOR FRAME 
QUEUE 'DATA 



RECEIVE 
STATUS 
QUEUE 




RECEIVE 
DESCRIPTOR 
REGISTERS 



RxDEQ 



RxSEQ 



Sit 



REC 
DESCI 
PROC 

j 


ETVE 
UPTOR 
ESSOR 


M> 

ENC 
it 


\C 
ilNE 

>? 



LAN 
MEDIUM 



< 

EX. 

O 
V3 



a. 





UJ 

-J 
O 




M 



u e 
>u 



M 



U3 — 
> W 

g2 



C/3 

<- a. 

Is 

a 



DC U. 




o 

UJ 

a 



to 




+ 



FILTER TAPS: 

PROMISCUOUSA 
IAHASHA 
MULTICASTA 
INDrVTDUALA 
BROADCASTA 



INCOMING FRAME 



1 


SHO 




DESTINATION 
ADDRESS 
JILTER 


IF THE FILTER IS 
NOT PASSED 






1HE FRAME IS " 
DISCARDED 











FILTER PASSED 



ACCEPT MASK: 

CRCRUNTA 
RUNTA 



5WI 









NOT 




ACCEPT (A) 


PASSED 




MASK 






ACCEPT MASK 


1 


PASSED 



FRAME 
DISCARDED 



STATUS IN RXEVENT 
AND THE FRAME BODY 
IS ACCEPTED INTO THE CHIP 



FRAME IS PASSED 
TO HOST MEMORY 
BY DESCRIPTOR 
PROCESSOR 



IE-MASK TAPS: 

RECETVESTQIE 
ENDOFSREAMEE 







5H2 












IE 






MASK 





INTERRUPT 
IF MASK OK 



5J 



+ 






BUFFER 
LENGTH 0(12) 




BUFFER 
LENGTH 1 (12) 




BUFFER 
LENGTH 2 (12) 






BUFFER 
LENGTH n (12) 


0(32) 


BUFFER 
Ord0(l4) 


<N 


BUFFER 
Otd 1 (14) 


r*l 

<N 


BUFFER 
Old 2 (14) 


• • • 




BUFFER 
Ordn(l4) 


TxBufAdr 




5 








e 

•n 




BUFFER 
INDEX 0(15) 


a 

a 


BUFFER 
INDEX 1 (IS) 


2 

3 

a 


BUFFER 
INDEX 2 (IS) 




TxBufA 


BUFFER 
INDEX n (15) 




08 




o£ 













0= e 

m 



/o V transmit status Queue 

" BITS 31-0 



TRANSMIT 
STATUS 
BASE 
ADDRESS 
(TxSBA) (32) 



STATUS 0 



STATUS 1 



STATUS 2 



• 



TRANSMIT-* 
STATUS 
CURRENT 
ADDRESS 
(TxSCA)(32) 



CURRENT FRAME STATUS 



NEXT STATUS POSITION 



• 



STATUS M 



31 



30 



FRAME STATUS (15) 



BUFFER INDEX (15) 



-TxWE» TRANSMITTED WITHOUT ERROR 
■TxFP- TRANSMIT FRAME PROCESSED 



PROTOCOL STACK 
XMITCALL 



SH3 



TX COMPLETE 



DEVICE DRIVER 



Tx TRANSMIT 
DESCRIPTOR FRAME 
QUEUE DATA 




TRANSMIT 
DESCRIPTOR 
REGISTERS 



TxDEQ n. Si* 



TRANSMIT 
DESCRIPTOR 
PROCESSOR 



MAC 
ENGINE 



<o7 



LAN 
MEDIUM 




^ r • i 



Inflate 
IV Doocriptcr 
andStatt* a»SS2 



WiteTxDBQ 

WthvalU <vH3 
dBscriplor ooiirt 



Random ttfrthQ 



I 



FIG. 5 0 



W» o, 551 



ReadTk *v 55*f 



Rod 7k 



Ul 



RMdlk 



IIS 



R*d7* 



! FtadTk 
I Oh 



tad 7k 



SvtiRaroO 



^ » b^M^ 4 



ut 



7k 



7k 



Tx 



r 



7k 



r 

I 



TkSMu 



J 



r 



r 



r 

(j^ 1/3 CO 



1 



1 



+ + » • 



+ 



/6 Ul-fS't 0 




tomch oerecr 




+ 



SAMPLE X-AXIS 



AD CONVERTER 




+ 



SAMPLE Y-AXIS 



VBAT 




6G 



+ 



0lSC»**6C ALL uives 



602 





AD CONVERTER 



IN 



6*3 



REF+ 



REF- 



^ 6rl 



+ 




600 



» | SCAN X-AXIS | ~*» 



DISCHARGE ALL FOR 
PRESET SETTLING TIME 



1- 



<94 



APPLY VOLTAGE 
TO X-AXIS 

♦ 

DELAY FOR PRESET L 



SETTLING TIME 

T 



\03 



TAKE 4.8 16. OR 32 
SAMPLES STORING 
MAX. MIN, AND AVERAGE 



NO 



YES 




YES 




1 



SET X INT PENDING 1 ^ 
XLAST-X h»- 



SET X INT PENDING 
XLAST-X I 



SCAN Y-AXIS | ^ a l 

*_ 



DISCHARGE ALL FOR 
PRESET SETTLING TIMEI 



* <3 



APPLY VOLTAGE 
TOY-AXIS 



DELAY FOR PRESET I ,^ 
SETTLING TIME r v » ,< » 



TAKE 4.8 16. OR 32 
SAMPLES STORING 
MAX, MTN, AND AVERAGE 



*/5 



NO 



YES 




YES 





:intV yes 

>ENDING> > » 



SET INTERRUPT L_ 
YLAST-Y h " 



SET Y INT PENDING 
YLAST-Y 



I 



+ 




TOUCH__PR£SS 



r 



VDD 




X+ 
X- 



WIPER 

5l/- . 




VBAT 




IN 



REF+ 



REF- 



+ 



r 



v+ 

V- 
Z+A 
Z-/+ 



£01 ^ 



WIPER 

sV+ 
sV- 



SAMPLE X-AXIS 



VDD 

n 





in/17 



J Vff 



VBAT 




\ 



REF+ 



RBF- 



+ 



r 



601 J 



WIPER 



SAMPLE Y-AXIS 





A/D CONVERTER 



IN 



£03 



REF+ 



REF- 



DISCHARGE ALL LINES 



r VDD 

A 




C M 



press 



r 



WIPER 

NOT 
USED 




TOUCH PRESS 




b 0 




AJD CONVERTER 



IN 



St>3 



REF+ 



REF- 



r 



NOT 
USED 



NOT 
USED 



SAMPLE VBAT 
VDD 




VBAT 

J 



AID CONVERTER 



IN 



603 



REF+ 



REF- 



3fr £P 



+ 




913 

Test 
Input 
Stimulus 
Register 



Prescaie 
cany 



L 



163 



I *0>B BUS ( 



1 



913 



APB interface 



Prescaler 




Count 
tut 



Timers 1-5 



Tone 

Stamp 

Debug 



Test 
Output 
Capon 
Register 



INTERRUPT 
SIGNALS 



+ 



t 61 



8 s| 



ill 



2 || 
8 5 



3 



i rrn 



-2 

8 



nunc 



=ST1 =5T 



II ill V V 



• • • 



• • • 



• • • 



• • • 



+ 



• • • 



♦ • • 



• • • 



• • • 



# • • 



• • • 



• • • 



• • • 





,1 


i 31* 


At 


It 


it 




At 


it 






SI 


it 


ir 




sr 

al«- 




ir 


it 


IV 








St 


it 




it 


It 


LU H a_ 


i 


a 4 

i 


o 

i A 



< 

> 

a) 

X 
00 



+ 



to 



,-ttl 

fit Q 



GO 

si 



tt] 
> 

Q -3 

is 



a 
z 
< 




+ 



o 

m 

u 

m 

m 
€ 

o 
m 

a 

€ 

a 
o 




< 



+ 






v5 




5 O O 



: il P 

• o < 




CO i- 

o ^ 

E ^ 

-P (D 

^ O 

E L. 

— 3 

<U o 

"§ «5 

O 

<t> TO 

a q 










e 






e 




o 


o 


o 
p> 


o 


i_ 


TO 




O 


u_ 


B 


o 




o 


p 






<S) 







I -p 
TO ^ 



P .13 
c +■ 
TO 
2 



TO 



X 
UJ 

<0 



fa I 

< Ic tfi 



CO 




2& 



^3 






E 
TO 


P 


P 


in 


SI 

<o 

E 


=s 


-P 




CD 


O 


TO 




E 


X 


v& 




UJ 


iS) 




> 


*-P 
E 


> 




TO 




E 




E 


TO 




TO 






X 


"S3 


a 




< 











o 

II 

TO 






I 



/on z-gp 






4- 




+> 




o 




13 








o 














in 
© 

c 




f 




Ar 


TO 
o 


uiti| 

per 


TO 


luft. 


a_ 

11 




> 










n 


to 




SI 






TO 


E 


o 


in 


X 


3 











7 




IN ro 





M 



3 




to 

sr 















CM 










CO 


3 


> 






i_ 




-P 


iply 


$ 

o 


TO 


CM 

cO 


CQ 
TO 


ir 


-J 




i_ 








■p 

CQ 


to 

EL. 


P 






TO 




TO 
















-p 







TO 



ti 



TO 
2 



II 

TO 

E ° 



TO 

TO 

E a- 

&_ TO 
< 



£ P 



ii 



It | 



to 

*4> to sr 



P 

o 

t» TO* 1>> 



-g 3 to 1 CM < TO 
§- TO CO TO ^ ^ 

"5 3 



® ^ § * ^ 

=J +s I to 5 =J II 

- E 1 1 ^ £ 



(Si 

55 -B 



TO 

2 



in 



< 



TO 



3 



IS. 

IS, 




U- ^ — 




O 



(J) 




o 



O -P 

x to 
LU ^ 



o O 
-P w 



OS u_ 



< 



N 



0% 





o 

M 
CO 



M 
90 


















o 






ii 




o 










► 
















O 




^ g. 

Cl. 


► 



7 




to 

o 



I 



JO 

a. tf> 09 







OS 








+^ 




5 




3 




+i 




OS 


O 




SI 




PATENT APPLICATION 
File Number: 1042-EP 



DECLARATION AND POWER OF ATTORNEY 
Original Application 
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of the specification, including the claims, as amended by any amendment specifically 
referred to in this Declaration, that the information given herein is true, that I believe 
that I am the original, first and joint inventor of the invention entitled: 

MATH COPROCESSOR 
which is described and claimed in: 
XX the attached specification or 

the specification in application Serial No. filed 

that I acknowledge my duty to disclose information in accordance with 37 C.F.R. 
Section 1 .56 and defined on the attached sheet, which is material to the examination of 
this application, that I do not know and do not believe the same was ever known or 
used in the United States of America before my or our invention thereof or patented or 
described in any printed publication in any country before my or our invention thereof, 
or more than one year prior to this application, that the invention has not been patented 
or made the subject of an inventor's certificate issued before the date of this application 
in any country foreign to the United States of America on an application filed by me or 
my legal representatives or assigns more than twelve months prior to this application 
and that as to applications for patent or inventor's certificate filed by me or my legal 
representatives or assigns in any country foreign to the United States of America, the 
earliest filed foreign application(s) filed within twelve months prior to the filing date of 
this application and all foreign applications filed more than twelve months prior to the 
filing date of this application, if any, are identified below. 
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Required information as to foreign applications filed prior to the filing date of this 

application is on page attached hereto and made a part hereof. 
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Section 1.56 Duty to Disclose Information Material to Patentability 

(a) A patent by its very nature is affected with a public interest. The public 
interest is best served, and the most effective patent examination occurs when, at the 
time an application is being examined, the Office is aware of and evaluates the 
teachings of all information material to patentability. Each individual associated with the 
filing and prosecution of a patent application has a duty of candor and good faith in 
dealing with the Office, which includes a duty to disclose to the office all information 
known to that individual to be material to patentability as defined in this section. The 
duty to disclose information exists with respect to each pending claim until the claim is 
canceled or withdrawn from consideration, or the application becomes abandoned. 
Information material to the patentability of a claim that is canceled or withdrawn from 
consideration need not be submitted if the information is not material to the patentability 
of any claim remaining under consideration in the application. There is no duty to 
submit information which is not material to the patentability of any existing claim. The 
duty to disclose all information known to be material to patentability is deemed to be 
satisfied if all information known to be material to patentability of any claim issued in a 
patent was cited by the Office or submitted to the Office in the manner prescribed by 
Sections 1.97(b)-(d) and 1.98. However, no patent will be granted on an application in 
connection with which fraud on the Office was practiced or attempted or the duty of 
disclosure was violated through bad faith or intentional misconduct. The Office 
encourages applications to carefully examine: 

(1 ) prior art cited in search reports of a foreign patent office in a 
counterpart application, and 

(2) the closest information over which individuals associated with the 
filing or prosecution of a patent application believe any pending claim patentably 
defines, to make sure that any material information contained therein is 
disclosed to the Office. 

(b) Under this section, information is material to patentability when it is not 
cumulative to information already of record of being made of record in the application, 
and 

(1 ) It establishes, by itself or in combination with other information, a 
prima facie case of unpatentability of a claim; or 

(2) It refutes, or is inconsistent with, a position the application takes in: 

(i) opposing an argument of unpatentability relied on by the 

Office, or 

(ii) Asserting an argument of patentability. 

A prima facie case of patentability is established when the information compels a 
conclusion that a claim is unpatentable under the preponderance of evidence, 
burden-of-proof standard, giving each term in the claim its broadest reasonable 
construction consistent with the specification, and before any considerations given to 
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evidence which may be submitted in an attempt to establish a contrary conclusion of 
patentability. 

© Individuals associated with the filing or prosecution of a patent application 
within the meaning of this section are: 

(1 ) Each inventor named in the application; 

(2) Each attorney or agent who prepares or prosecutes the application; 

and 

(3) Every other person who is substantively involved in the preparation 
or prosecution of the application and who is associated with the inventor, with the 
assignee or with anyone to whom there is an obligation to assign the application. 

(d) Individuals other than the attorney, agent or inventor may comply with this 
section by disclosing information to the attorney, agent or inventor. 

■-ODMA\PCDOCS\DALLAS_1\3310575\1 
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APPENDIX A 

DSPSC 

Default: 0x00000000 
Definition: DSP Status/Control 
Bit Definitions: 

DAID [2:0] DSP Architecture ID. This read-only value 

will be incremented for each revision of the 
overall DSP coprocessor architecture (future 
revisions may support multiple MAC units, 
dedicated 0 0 0= First architecture. 

HVID[2:0] Hardware Version ID. This read-only value will be 
incremented each time the hardware implementation 
of the architecture named by DAID [2:0] is changed, 
typically done in response to bugs. 
0 0 0= First version. 

V Overflow Flag. Indicates that an integer 

operation overflowed. 

0 = No overflow (reset default) . 

1 = Overflow. 

RM[1:0] Rounding Mode. Selects IEEE 754 rounding mode. 
0 0= Round to nearest (reset default) . 
0 1= Round toward 0 . 
10= Round to -« . 
11 = Round to +°°. 

IXE Inexact Trap Enable. Enables/disables software 

trapping for IEEE 754 inexact exceptions. 

0 = Disable software trapping for inexact 
exceptions (reset default) . 

1 = Enable software trapping for inexact 
exceptions . 
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UFE Underflow Trap Enable. Enables/disables software 

trapping for IEEE 754 underflow exceptions. 
0 = Disable software trapping for underflow 
exceptions (reset default) . 
50 1 = Enable software trapping for underflow 

exceptions . 

OFE Overflow Trap Enable. Enables/disables software 

trapping for IEEE 754 overflow exceptions. 

0 = Disable software trapping for overflow 
55 exceptions (reset default) . 

1 = Enable software trapping for overflow 
exceptions . 

DZE Divide By Zero Trap Enable. Enables/disables 

software trapping for IEEE 754 divide by zero 
60 exceptions . 

0 = Disable software trapping for divide by zero 
exceptions (reset default) . 

1 = Enable software trapping for divide by zero 
exceptions . 

65 IOE Invalid Operator Trap Enable. Enables/disables 

software trapping for IEEE 754 invalid operator 
exceptions . 

0 = Disable software trapping for invalid operator 
exceptions (reset default) . 

1 = Enable software trapping for invalid operator 
exceptions . 



70 



IX Inexact. Status bit that's set when an IEEE 754 

inexact exception occurs (regardless of whether or 
not software trapping for inexact exceptions is 
75 enabled). Writing a r l ! to this position clears 

the status bit. 

0 = No inexact exception detected (reset default) . 
Writing a '0' to this bit is ignored. 

1 = Inexact exception detected. Writing a ! l r to 
80 this bit clears it. 

UF Underflow. Status bit that's set when an IEEE 754 

underflow exception occurs (regardless of whether 
or not software trapping for underflow exceptions 
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is enabled). Writing a '1' to this position 
85 clears the status bit. 

0 = No underflow exception detected (reset 
default). Writing a f 0' to this bit is ignored. 

1 = Underflow exception detected. Writing a 'l' 
to this bit clears it. 

90 O p Overflow. Status bit that's set when an IEEE 754 

overflow exception occurs (regardless of whether 
or not software trapping for overflow exceptions 
is enabled). Writing a ' 1 ! to this position 
clears the status bit. 

95 0 = No overflow exception detected (reset 

default). Writing a '0' to this bit is ignored. 
1 = Overflow exception detected. Writing a '1' to 
this bit clears it. 

IS DZ Divide By Zero. Status bit that's set when an 

m 100 IEEE 754 divide by zero exception occurs 

m (regardless of whether or not software trapping 

(1 for divide by zero exceptions is enabled) . 

m Writing a ! 1 ! to this position clears the status 

h bit. 

=fj 105 0 = No divide by zero exception detected (reset 

default). Writing a ' 0 1 to this bit is ignored. 
□ 1 = Divide by zero exception detected. Writing a 

S ' l 1 to this bit clears it. 

in 10 Invalid Operator. Status bit that's set when an 

q 110 IEEE 754 invalid operator exception occurs 

p (regardless of whether or not software trapping 

for invalid operator exceptions is enabled) . 

Writing a '1' to this position clears the status 

bit. 

115 0 = No invalid operator exception detected (reset 

default). Writing a 1 0 1 to this bit is ignored. 
1 = Invalid operator exception detected. Writing 
a '1' to this bit clears it. 



WSM Docket No. 2836-P101US 



Attorney Docket No. 
1042- EP 



Patent 



116 



APPENDIX B 



120 



Instruction Set 



Notes : 



1) Shaded fields in instruction encodings indicate bits 
that are ignored by the DSP coprocessor. 

2) The <cond> portion of each instruction mnemonic refers 
to the standard ARM conditional mnemonic extension (see 
the second column of Table 15) . 

3) Coprocessor number 4 is associated with single and 
double precision floating point numbers. 

4) Coprocessor number 5 is associated with 32- and 64-bit 



5) Coprocessor number 6 is associated with the 
accumulators and the DSPSC register. 
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integers . 



